Re: C# and encodings

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



On Feb 3, 3:41 pm, "Peter Duniho" <NpOeStPe...@xxxxxxxxxxxxxxxx>
wrote:
Windows only has one current code page at a time.

Well, nt quite - Windows (or rather, a specific user) has one locale
at a time, but two associated non-Unicode codepages - one for GUI (aka
"ANSI" in Win32 parlance - this is what Encoding.Default returns), one
for text mode ("OEM") - a legacy of DOS. If I remember correctly, the
latter one can actually be changed using "chcp" within the context of
a specific command line session - another DOS artifact.

b) Can code page  support Unicode coded character set, but may use
different encoding than Unicode does ( Unicode set uses three
encodings - UTF-8, UTF-16 and UTF-32 )?

The code page can be and often is not Unicode.  Any character encoding  
that is not Unicode by definition uses a different encoding than Unicode  
does.

I think we need to start distinguishing between character set and
encoding here :)

Character set - a set of valid characters (code points in Unicode
parlance). Unicode is a character set, but not an encoding. CP1250 is
a character set, which is always encoded using 8-bit clean.

Encoding - a way of encoding a specific character set as a sequence of
bytes. UTF-8, UTF-16, UCS4 are all encodings of Unicode.

A Windows code page is in fact an encoding, not a character set (as
evidenced by the fact that there's a specific codepage for UTF-8, and
one could technically add a separate codepage for UTF-7).

.



Relevant Pages

  • Re: Proposal: require 7-bit source strs
    ... If the application knows which encoding it is so it can convert at all, ... If you mean 'limited' to some other character set than Unicode, ... is that because you think of Unicode as The ... > standard grows with its adoption. ...
    (comp.lang.python)
  • Re: Posting with XHR and ISO-8859-15
    ... Universal Character Set, regardless of the encoding used. ... that was not a problem before Unicode and the various Unicode ... encodeURIComponent() for the reason stated above, ...
    (comp.lang.javascript)
  • Re: Java Newbie Question: Character Sets, Unicode, et al
    ... Actually, Unicode is not really a character set in the way ASCII is, ... How these codes are concretely represented as bytes is what an encoding ...
    (comp.lang.java.programmer)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... and strings in Unicode - many modern languages allow it. ... proprietary half-baked encoding that is incompatible with every other tool ... UNICODE character set regarding the Western European ... But I gather now that European programmers, for the most part, ...
    (comp.arch.embedded)
  • Re: How can I change the code page that VC2005 IDE uses.
    ... console mode without save as Unicode. ... To select the codepage for an individual file, you can choose the Save As ... Click on this and then choose Save with Encoding and you can then ...
    (microsoft.public.vc.ide_general)