Re: unicode file



Thanks for that information Mihai. If it works, that would be a great addition to the library...

I appreciate the link.

Tom

"Mihai N." <nmihai_year_2000@xxxxxxxxx> wrote in message news:Xns9AC67703E353MihaiN@xxxxxxxxxxxxxxxx
how can now if the file is unicode or ansi
and if is ansi how can i convert it to unicode

Adding even more to the answers from Tom and Giovanni :-)

If you use a VS8, take a loot at _open:
http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.80).aspx
You can specify _O_TEXT, _O_U16TEXT, _O_U8TEXT or _O_WTEXT

And the nicest part about _O_WTEXT:
"If _O_WTEXT is used to open a file for reading, _open reads the
beginning of the file and check for a byte order mark (BOM).
If there is a BOM, the file is treated as UTF-8 or UTF-16LE
depending on the BOM. If no BOM is present, the file is treated
as ANSI. When a file is opened for writing using _O_WTEXT, UTF-16
is used. If _O_UTF8 is used, the file is always opened as UTF-8
and if _O_UTF16 is used, the file is always opened as UTF-16
regardless of any previous setting or byte order mark."

No extra libraries (sorry Giovanni :-) and no need to do your own conversion
(sorry Tom :-)


--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

.



Relevant Pages

  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... somehow is supposed to be better than Ansi, ... Windows Notepad calls "Unicode"? ... and UTF-16 uses up to two 16-bit characters. ... so UTF-8 and Ansi are ...
    (microsoft.public.vc.mfc)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... Yet it is somehow is supposed to be better than Ansi, ... Is UTF-16 the same as what Windows Notepad calls "Unicode"? ... Both UTF-8 and UTF-16 are complete encodings of Unicode. ...
    (microsoft.public.vc.mfc)
  • Re: unicode file
    ... If there is a BOM, the file is treated as UTF-8 or UTF-16LE ... When a file is opened for writing using _O_WTEXT, UTF-16 ... My small library does the UTF-16 to UTF-8 conversion behind the scene. ...
    (microsoft.public.vc.mfc)
  • Re: Defacto standard string library
    ... UTF-8 (or UTF-16), because it's possible that there was no BOM and the ... I am using a protocol that has BOM at the start of text. ... represent an initial ZWNBSP? ... The particular code point for the ZWNBSP was chosen, IIRC, because the UTF-16LE and UTF-16BE encodings of it were invalid UTF-8, thus distinguishing exactly which of the three UTFs was in use -- but it can't definitively tell you that it's not some other encoding. ...
    (comp.lang.c)
  • Re: Tidy using unicode does not validate
    ... There are two UTF-8 encodings: with and without a BOM at the start of ... Until of course the minions with their UTF-16 ... If you would like a megabyte of cheap Indian Java source where these ...
    (alt.html)