Re: strings in C++

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance




"David Webber" <dave@xxxxxxxxxxxxxxxxxxxxxxxxxxx> ha scritto nel messaggio
news:O3i$Tg5uIHA.4848@xxxxxxxxxxxxxxxxxxxxxxx

"Giovanni Dicanio" <giovanni.dicanio@xxxxxxxxxxx> wrote in message
news:%2379$ux4uIHA.5472@xxxxxxxxxxxxxxxxxxxxxxx

...
I tend to save text out of application boundaries using Unicode UTF-8
(char's), ...

Why? [I am not criticising - just being curious!]

Hi David,

I do that because it seems to me that Unicode UTF-8 is very useful (and kind
of "de facto" standard) for multiplatform communication of textual data. For
example, I think that XML default Unicode format is UTF-8. UTF-8 is widely
used on the Internet, in general.

Moreover, I like UTF-8 because there is no waste of memory for "normal"
ASCII characters (instead, with UTF-16, there is the null byte associated to
pad to 16 bits).

Another aspect I like about UTF-8 is that UTF-8 hasn't got the problem of
endiannes, i.e. UTF-8 is "just UTF-8" on every platform: Windows, Mac,
Linux, etc.
Instead, Unicode UTF-16 can be divided in two categories: UTF-16 LE and
UTF-16 BE, and you have to check the BOM (if present...) to understand which
particular endiannes the file you are reading is. In fact, I think it is
neither safe nor robust to assume that UTF-16 is always UTF-16 LE (the
default of Windows); there is also UTF-16 BE, which I think is used on Macs.
If I save (and load) the file (or textual data in general) using UTF-8, I
don't have this additional problem of platform endianness.

I use UTF-16 (with Windows endiannes) inside Windows applications because it
is the default Unicode format supported by Windows APIs (the <DoSomething>W
ones).

And I think that C# and .NET framework use the same approach by default:
they save textual data using UTF-8, and convert to UTF-16 (.NET String
class) when the text is used inside the application.
In fact I read there:

http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx

<cite>
StreamWriter defaults to using an instance of UTF8Encoding unless specified
otherwise. [...]
</cite>

Giovanni


.



Relevant Pages

  • Re: RfD: XCHAR wordset
    ... It's somewhat worse, because Windows has "A" prototypes, which convert the ... current code page into UTF-16 on the fly. ... Actually, it might be possible to change the current code page to UTF-8, but ... Windows strings are usually not C strings, ...
    (comp.lang.forth)
  • Re: Case-sensitivity as option?
    ... Code points beyond 0x10FFFF cannot be encoded with UTF-16, ... it is unlikely that Unicode will ... Windows to UTF-8. ... encode them with normal surrogates. ...
    (comp.lang.forth)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... put on Unix ages ago. ... (When Unix filesystems can write UTF-16 as ... translate to UTF-8 and/or follow the nonsensical POSIX rules for native ...
    (comp.lang.ruby)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... Yet it is somehow is supposed to be better than Ansi, ... Is UTF-16 the same as what Windows Notepad calls "Unicode"? ... Both UTF-8 and UTF-16 are complete encodings of Unicode. ...
    (microsoft.public.vc.mfc)
  • Re: AfxMessageBox?
    ... I also like to use UTF-8 for XML. ... to MFC to support this sort of thing. ... I know there are different kinds of UTF-16:o) ... Mihai Nita [Microsoft MVP, Windows - SDK] ...
    (microsoft.public.vc.mfc)