Re: strings in C++
- From: "Giovanni Dicanio" <giovanni.dicanio@xxxxxxxxxxx>
- Date: Thu, 22 May 2008 10:21:42 +0200
"David Webber" <dave@xxxxxxxxxxxxxxxxxxxxxxxxxxx> ha scritto nel messaggio
news:O3i$Tg5uIHA.4848@xxxxxxxxxxxxxxxxxxxxxxx
"Giovanni Dicanio" <giovanni.dicanio@xxxxxxxxxxx> wrote in message
news:%2379$ux4uIHA.5472@xxxxxxxxxxxxxxxxxxxxxxx
...
I tend to save text out of application boundaries using Unicode UTF-8
(char's), ...
Why? [I am not criticising - just being curious!]
Hi David,
I do that because it seems to me that Unicode UTF-8 is very useful (and kind
of "de facto" standard) for multiplatform communication of textual data. For
example, I think that XML default Unicode format is UTF-8. UTF-8 is widely
used on the Internet, in general.
Moreover, I like UTF-8 because there is no waste of memory for "normal"
ASCII characters (instead, with UTF-16, there is the null byte associated to
pad to 16 bits).
Another aspect I like about UTF-8 is that UTF-8 hasn't got the problem of
endiannes, i.e. UTF-8 is "just UTF-8" on every platform: Windows, Mac,
Linux, etc.
Instead, Unicode UTF-16 can be divided in two categories: UTF-16 LE and
UTF-16 BE, and you have to check the BOM (if present...) to understand which
particular endiannes the file you are reading is. In fact, I think it is
neither safe nor robust to assume that UTF-16 is always UTF-16 LE (the
default of Windows); there is also UTF-16 BE, which I think is used on Macs.
If I save (and load) the file (or textual data in general) using UTF-8, I
don't have this additional problem of platform endianness.
I use UTF-16 (with Windows endiannes) inside Windows applications because it
is the default Unicode format supported by Windows APIs (the <DoSomething>W
ones).
And I think that C# and .NET framework use the same approach by default:
they save textual data using UTF-8, and convert to UTF-16 (.NET String
class) when the text is used inside the application.
In fact I read there:
http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx
<cite>
StreamWriter defaults to using an instance of UTF8Encoding unless specified
otherwise. [...]
</cite>
Giovanni
.
- Follow-Ups:
- Re: strings in C++
- From: David Webber
- Re: strings in C++
- References:
- strings in C++
- From: Jeff A
- Re: strings in C++
- From: Igor Tandetnik
- Re: strings in C++
- From: Giovanni Dicanio
- Re: strings in C++
- From: David Webber
- strings in C++
- Prev by Date: Re: Problem in writing into a file in German XP OS
- Next by Date: Re: CRT and Win32 SDK
- Previous by thread: Re: strings in C++
- Next by thread: Re: strings in C++
- Index(es):
Relevant Pages
|