Re: Unicode strings and byte arrays
- From: "Karl E. Peterson" <karl@xxxxxxxx>
- Date: Fri, 6 May 2005 14:47:44 -0700
YYZ wrote:
> In doing more investigation over lunch, it seems that my text editor
> was helping me out a bit, but because I didn't know it it was
> confusing me. A "bad" files looks like this in hex view:
>
> FF FE 69 00 66 00 20 00 65 00 78 00 69 00 73 00
> 74 00 73 00 20 00 28 00 73 00 65 00 6C 00 65 00
I remember seeing that signature, now that you post it! That's how Notepad stored
Unicode. No idea how "universal" it is. Oughta be, huh? <g>
I just tried saving "as Unicode" with TextPad, and that didn't add the signature,
fwiw. Resaving with Notepad added the sig. TextPad could still open it, too, btw.
> all the even columns of hex codes. EXCEPT that return characters (0D
> 0A) aren't separated with 00 between them, which really messes
Oh that's just *really* weird! Here, they're 2-char, just like everything else! I
guess every editor is using a different standard. Lovely, huh?
>>> Does the IsTextUnicode API call do any good? I've seen it but never
>>> needed to try it.
>>>
>>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81np.asp
>>
>> There ya go! I *thought* I'd run across that at some point, but
>> sure as heck
>> couldn't find it in MSDN myself. Gotta love the description:
>>
>> "The IsTextUnicode function determines whether a buffer is likely to
>> contain a form
>> of Unicode text. The function uses various statistical and
>> deterministic methods to
>> make its determination, ..."
>>
>> So, not even Microsoft knows of a good way to really tell. <g>
>
> No kidding!
Looks like your best bet, though.
> I did find out that I can use this:
> IsTextUnicode(btByte(0), lLen, IS_TEXT_UNICODE_SIGNATURE)
> and the retval will be 0 for a pure ascii file, and <> 0 for one of my
> messed up files -- that unicode signature evidently is added to the
> beginning of all unicode files as 0xFEFF -- assuming the app that
> saved it like that plays by the rules. So far it works fine. Now I
> just have to write the function to copy selected elements of the byte
> array.
If it weren't for the odd Cr/Lf pairs, I'd say you're on your way. Would it be
possible to get these goobs to use normal ANSI when they save? <g>
Have fun... Karl
--
Working Without a .NET?
http://classicvb.org/petition
.
- Follow-Ups:
- Re: Unicode strings and byte arrays
- From: YYZ
- Re: Unicode strings and byte arrays
- References:
- Unicode strings and byte arrays
- From: YYZ
- Re: Unicode strings and byte arrays
- From: Karl E. Peterson
- Re: Unicode strings and byte arrays
- From: YYZ
- Re: Unicode strings and byte arrays
- From: Karl E. Peterson
- Re: Unicode strings and byte arrays
- From: Bob Butler
- Re: Unicode strings and byte arrays
- From: Karl E. Peterson
- Re: Unicode strings and byte arrays
- From: YYZ
- Unicode strings and byte arrays
- Prev by Date: Re: Object navigation with <Enter> key stroke
- Next by Date: Re: Unicode strings and byte arrays
- Previous by thread: Re: Unicode strings and byte arrays
- Next by thread: Re: Unicode strings and byte arrays
- Index(es):
Relevant Pages
|