Re: L macro



Actually, Linux didn't need to change in that regard. It simply
adopted UTF-8. Still, GCC does have wchar_t and it is 32-bit,
e.g. supports UTF-32 encoding. The only advantage UTF-16
has over UTF-8 is shorter strings for literals with common far
eastern characters - 2 bytes per character vs 3-bytes in UTF-8.
UTF-8 has the advantage over UTF-16 for ASCII characters
though - a single byte per character vs 2 bytes in UTF-16.

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnickolov@xxxxxxxx
MVP VC FAQ: http://vcfaq.mvps.org
=====================================

"Bo Persson" <bop@xxxxxx> wrote in message
news:5j0furF3rng1hU1@xxxxxxxxxxxxxxxxxxxxx
Giovanni Dicanio wrote:
:: "Ulrich Eckhardt" <eckhardt@xxxxxxxxxxxxxx> ha scritto nel
:: messaggio news:84kqp4-9pd.ln1@xxxxxxxxxxxxxxxxxxxxxxxxx
::
::: because on win32 wchar_t
::: doesn't support all Unicode chars in one element.
::
:: My understanding is that on Win32 wchar_t is used to store Unicode
:: UTF-16 code-points. So, some Unicode chars must be stored using
:: "surrogate pairs" (i.e. a pair of 16-bits wchar_t).
:: I wonder why the C++ standard commitee did not make wchar_t == 32
:: bits on every platform, or at least disambiguate using wchar16_t
:: and wchar32_t (or better: utf16char_t and utf32char_t).
::

When some compilers decided to make wchar_t 16 bits, all existing Unicode
code points did fit in 16 bits. Only later did it become clear that you
should have selected a 21+ bit data type.

The standards committee cannot dictate changes, only document current
practice or invent entirely new features. This didn't work for wchar_t,
where neither Windows nor Linux was prepared to change.


You might like to know that in the next C++ revision, hopefully called
C++09, there will be two new character types char16_t and char32_t, with
the properties you ask for.


Bo Persson




.



Relevant Pages

  • Re: Unicode string libraries
    ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
    (comp.programming)
  • Re: Case-sensitivity as option?
    ... Code points beyond 0x10FFFF cannot be encoded with UTF-16, ... it is unlikely that Unicode will ... Windows to UTF-8. ... encode them with normal surrogates. ...
    (comp.lang.forth)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... put on Unix ages ago. ... (When Unix filesystems can write UTF-16 as ... translate to UTF-8 and/or follow the nonsensical POSIX rules for native ...
    (comp.lang.ruby)
  • Re: AfxMessageBox?
    ... except that unfortunately there are now surrogate pairs in UTF-16. ... This means that any program that does string manipulation assuming each wchar_t is a single character is technically incorrect, ... Microsoft 16-bit "Unicode" no longer has the advantage that motivated its creation. ... I confess that one reason I like UTF-8 is that is backward compatible with code that assumed all ASCII characters. ...
    (microsoft.public.vc.mfc)
  • Re: Unicode Delphi Win32 - which approach
    ... I like the backwards compatibility aspects of UTF-8 vs UTF-16. ... UTF-8 encoding is different from ANSI, at least it's still byte oriented ... encoding, programmers will be forced to "think" Unicode, and not ...
    (borland.public.delphi.non-technical)