Re: Is WideCharToMultiByte(...) works fine If unicode char is more than 2 byte???

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



> I agree, but if we have _mbslen for 8-bit shouldn't we also have
> something for 16-bit? Even if people won't use it!
_mbslen has the same problems with ligatures and such.
But yes, what we would probably need is a "linguistic character length"

> I'm not sure there is anything missing that I need; I was just trying to
> find out if WideCharToMultiByte() was UCS-2 or UTF-16, and was puzzled
> that the MS documentation does not seem to address this issue. For my
> application it is extremely unlikely to matter.
This is usually the case :-)
But short unswer: is UCS-2 for WinNT&W9x and UTF-16 for newer systems.
And the only code page where it matters is GB18030, which is not supported at
all on old systems.

> Sorry, I didn't mean to imply that MS (or others) were idiots, but I do
> think they might not have made this decision if they had known that
> Unicode would expand beyond UCS-2. I believe there are Unix/Linux
> systems where the operating system uses UTF-8 and wchar_t is 32 bits,
> and I feel that this might be easier to use than UTF-16.
I still think utf-16 is a good choice even when knowing that 16 bits is not
enough. The ICU and Xerces made this choice quite recently and the the link
to the Unicode FAQ explains some of the "why".
But yes, most Unix/Linux systems use utf-8 in the kernel, file systems, and
other places where it does not matter and being agnostic works ok.
And wchar_t is 32 on many Linux systems, and some popular libraries are using
it (for example wxWidgets).
But the main basic principle is still the same:
- utf-7, utf-8 are good not to break legacy applications/libraries and to
move stuff arround
- utf-16, utf-32 are good when the strings should be processed

> Anyway, I think I understand. Later versions of Windows (XP in
> particular) support UTF-16, but earlier ones are just UCS-2.
Right :-)


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
.



Relevant Pages

  • Re: RfD: XCHAR wordset
    ... It's somewhat worse, because Windows has "A" prototypes, which convert the ... current code page into UTF-16 on the fly. ... Actually, it might be possible to change the current code page to UTF-8, but ... Windows strings are usually not C strings, ...
    (comp.lang.forth)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... Yet it is somehow is supposed to be better than Ansi, ... Is UTF-16 the same as what Windows Notepad calls "Unicode"? ... Both UTF-8 and UTF-16 are complete encodings of Unicode. ...
    (microsoft.public.vc.mfc)
  • Re: AfxMessageBox?
    ... I also like to use UTF-8 for XML. ... to MFC to support this sort of thing. ... I know there are different kinds of UTF-16:o) ... Mihai Nita [Microsoft MVP, Windows - SDK] ...
    (microsoft.public.vc.mfc)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... somehow is supposed to be better than Ansi, ... Windows Notepad calls "Unicode"? ... and UTF-16 uses up to two 16-bit characters. ... so UTF-8 and Ansi are ...
    (microsoft.public.vc.mfc)
  • Re: Support for UTF-16 on Solaris
    ... whereas with UTF-16 you may find yourself having to reinvent the wheel. ... which is interesting on Windows but probably less interesting on Solaris ... (where, for instance, it may make sense to use UTF-8 as the native format). ...
    (comp.unix.solaris)