Re: C# and encodings
- From: Pavel Minaev <int19h@xxxxxxxxx>
- Date: Tue, 3 Feb 2009 22:01:26 -0800 (PST)
On Feb 3, 3:41 pm, "Peter Duniho" <NpOeStPe...@xxxxxxxxxxxxxxxx>
wrote:
Windows only has one current code page at a time.
Well, nt quite - Windows (or rather, a specific user) has one locale
at a time, but two associated non-Unicode codepages - one for GUI (aka
"ANSI" in Win32 parlance - this is what Encoding.Default returns), one
for text mode ("OEM") - a legacy of DOS. If I remember correctly, the
latter one can actually be changed using "chcp" within the context of
a specific command line session - another DOS artifact.
b) Can code page support Unicode coded character set, but may use
different encoding than Unicode does ( Unicode set uses three
encodings - UTF-8, UTF-16 and UTF-32 )?
The code page can be and often is not Unicode. Any character encoding
that is not Unicode by definition uses a different encoding than Unicode
does.
I think we need to start distinguishing between character set and
encoding here :)
Character set - a set of valid characters (code points in Unicode
parlance). Unicode is a character set, but not an encoding. CP1250 is
a character set, which is always encoded using 8-bit clean.
Encoding - a way of encoding a specific character set as a sequence of
bytes. UTF-8, UTF-16, UCS4 are all encodings of Unicode.
A Windows code page is in fact an encoding, not a character set (as
evidenced by the fact that there's a specific codepage for UTF-8, and
one could technically add a separate codepage for UTF-7).
.
- Follow-Ups:
- Re: C# and encodings
- From: Mihai N.
- Re: C# and encodings
- References:
- C# and encodings
- From: beginwithl
- Re: C# and encodings
- From: Peter Duniho
- C# and encodings
- Prev by Date: Re: C# and encodings
- Next by Date: Re: c# call virtual/abstruct function from the constructor of the base class
- Previous by thread: Re: C# and encodings
- Next by thread: Re: C# and encodings
- Index(es):
Relevant Pages
|