Re: unicode file




"Mihai N." <nmihai_year_2000@xxxxxxxxx> ha scritto nel messaggio
news:Xns9AC67703E353MihaiN@xxxxxxxxxxxxxxxx

If you use a VS8, take a loot at _open:
http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.80).aspx
You can specify _O_TEXT, _O_U16TEXT, _O_U8TEXT or _O_WTEXT

And the nicest part about _O_WTEXT:
"If _O_WTEXT is used to open a file for reading, _open reads the
beginning of the file and check for a byte order mark (BOM).
If there is a BOM, the file is treated as UTF-8 or UTF-16LE
depending on the BOM. If no BOM is present, the file is treated
as ANSI. When a file is opened for writing using _O_WTEXT, UTF-16
is used. If _O_UTF8 is used, the file is always opened as UTF-8
and if _O_UTF16 is used, the file is always opened as UTF-16
regardless of any previous setting or byte order mark."

No extra libraries (sorry Giovanni :-) and no need to do your own
conversion
(sorry Tom :-)

I'm glad that VC8 introduced that Unicode option.

However, the extra library I wrote (nothing special, of course) is useful
for its simplicity and to give a RAII approach to Unicode UTF-8 text file
management.

For example, to write text, you can just do:

<code>

UTF8TextFileWriter writer( L"filename..." );
writer.WriteLine( ...some unicode UTF-16 string );

</code>

it is simpler than calling _open, specifying some flags, converting
UTF-16 -> UTF-8, closing the file, etc.
Moreover, if you pass a UTF-16 string to VC8 specific functions, is this
string automatically converted to UTF-8 before writing to file?
My small library does the UTF-16 to UTF-8 conversion behind the scene.

And similar things for reading (UTF-8 text is converted back to UTF-16,
automatically; the programmer does not need to pay attention to these
details - he just needs to call a ReadLine method, and he gets a UTF-16
string).

Of course, my library is limited to the particular case (which I use) to use
UTF-8 in external files, and UTF-16 inside the app.
(It does not manage UTF-16 LE/BE files, or UTF-32 files.)

Thanks,
Giovanni



.



Relevant Pages

  • Re: Getting prepared for Unicode
    ... Since conversion between UTF-8 and UTF-16 is quite fast, I generally prefer storing and manpulating all text in UTF-8 format, converting to UTF-16 only for displaying the text through the Windows API. ... It would be useful if the language supported implicit conversions between UnicodeString and UTF8String, similar to the way it currently does with WideString and AnsiString. ...
    (borland.public.delphi.non-technical)
  • Re: Defacto standard string library
    ... UTF-8 (or UTF-16), because it's possible that there was no BOM and the ... I am using a protocol that has BOM at the start of text. ... represent an initial ZWNBSP? ... The particular code point for the ZWNBSP was chosen, IIRC, because the UTF-16LE and UTF-16BE encodings of it were invalid UTF-8, thus distinguishing exactly which of the three UTFs was in use -- but it can't definitively tell you that it's not some other encoding. ...
    (comp.lang.c)
  • Re: unicode file
    ... and if is ansi how can i convert it to unicode ... If there is a BOM, the file is treated as UTF-8 or UTF-16LE ... When a file is opened for writing using _O_WTEXT, UTF-16 ...
    (microsoft.public.vc.mfc)
  • Re: Tidy using unicode does not validate
    ... There are two UTF-8 encodings: with and without a BOM at the start of ... Until of course the minions with their UTF-16 ... If you would like a megabyte of cheap Indian Java source where these ...
    (alt.html)
  • Re: Library function to detect UTF-8 streams without BOM
    ... If no "Encoding" attribute is present in the XML's prolog, ... BOM is present, either UTF-8 or UTF-16 can be used, but if a BOM is not ...
    (borland.public.delphi.thirdpartytools.general)