Re: Opening a text file that may be ASCII *or* Unicode
- From: "Joe Earnest" <jearnest3-SPAM@xxxxxxxxxxxxx>
- Date: Sun, 19 Jun 2005 07:10:36 -0600
Hi Andrew,
[top post]
Michael Harris posted something very similar back in 2001. AFAIK, this is
is the only way to determine status prior to opening.
Here's MH's post from 2001
http://groups-beta.google.com/group/microsoft.public.scripting.vbscript/browse_frm/thread/628f93f8430000a5/66d14306ff6c925c?q=unicode+255+254+group:microsoft.public.scripting.*+author:Michael+author:Harris&rnum=1&hl=en#66d14306ff6c925c
There is a fly in the ointment, however, even in MH's post. There are at
least 5 different unicode BOMs, that signal how the file is interpreted --
UTF-8: EF BB BF
UTF-16, Big-Endian: FE FF
UTF-16, Little-Endian: FF FE
UTF-32, Big-Endian: 00 00 FE FF
UTF-32, Little-Endian: FF FE 00 00
Your technique catches the two most common for Western users. Once you've
opened your file, however, it's neither time-consuming nor much additional
code, to read the first four bytes and test for all of these. I have a WSC
routine that's called by a file open-for-reading method to do that. (A
small suggestion for your posted code, either error-trap or get the file
size first, to insure that file contains the appropriate number of bytes
that you're reading. It could well be ASCII empty -- no bytes.)
You might want to take a look at these --
UTF & BOM
http://www.unicode.org/faq/utf_bom.html
(for the BOM table, scroll down to the Byte Order Mark heading)
Joel Spolsky, The Absolute Minimum Every Software Developer Absolutely,
Positively Must Know About Unicode and Character Sets (No Excuses!)
http://www.joelonsoftware.com/articles/Unicode.html
(thanks to mayayana for this one)
Regards,
Joe Earnest
"Andrew Aronoff" <NOSPAM_WRONG.ADDRESS@xxxxxxxxx> wrote in message
news:pigab19o4s8tqc5h9h3ibtb01eofdiohtf@xxxxxxxxxx
> Since I can't find any documentation about TriStateUseDefault, I
> decided to open the file in ASCII; read the first two characters;
> close the file; compare those characters to 255 & 254; if true, open
> in Unicode, otherwise open in ASCII.
>
>
> Const ForReading = 1
> Const TriStateFalse_ASCII = 0, TriStateTrue_Unicode = -1
>
> 'strFileName points to a text file in ASCII or Unicode
> Set oTextFile = Fso.OpenTextFile (strFileName, ForReading, _
> False,TriStateFalse_ASCII)
>
> 'read 1st 2 chrs, find Asc chr code
> intAsc1Chr = Asc(oTextFile.Read(1))
> intAsc2Chr = Asc(oTextFile.Read(1))
>
> oTextFile.Close
>
> If intAsc1Chr = 255 And intAsc2Chr = 254 Then
>
> 'open the file in Unicode
> Set oTextFile = Fso.OpenTextFile (strFileName,ForReading, _
> False,TriStateTrue_Unicode)
>
> Else
>
> 'open the file in ASCII
> Set oTextFile = Fso.OpenTextFile (strFileName,ForReading, _
> False,TriStateFalse_ASCII)
>
> End If
>
>
> It's not elegant, but it seems to work.
>
> regards, Andy
> --
> **********
>
> Please send e-mail to: usenet (dot) post (at) aaronoff (dot) com
>
> To identify everything that starts up with Windows, download
> "Silent Runners.vbs" at www.silentrunners.org
>
> **********
.
- Follow-Ups:
- Re: Opening a text file that may be ASCII *or* Unicode
- From: Andrew Aronoff
- Re: Opening a text file that may be ASCII *or* Unicode
- References:
- Opening a text file that may be ASCII *or* Unicode
- From: Andrew Aronoff
- Re: Opening a text file that may be ASCII *or* Unicode
- From: Andrew Aronoff
- Re: Opening a text file that may be ASCII *or* Unicode
- From: Miyahn
- Re: Opening a text file that may be ASCII *or* Unicode
- From: Andrew Aronoff
- Re: Opening a text file that may be ASCII *or* Unicode
- From: Miyahn
- Re: Opening a text file that may be ASCII *or* Unicode
- From: Andrew Aronoff
- Opening a text file that may be ASCII *or* Unicode
- Prev by Date: Re: Logoff script!
- Next by Date: Re: Internet usage!
- Previous by thread: Re: Opening a text file that may be ASCII *or* Unicode
- Next by thread: Re: Opening a text file that may be ASCII *or* Unicode
- Index(es):
Relevant Pages
|