Re: C# and encodings



beginwithl@xxxxxxxxx wrote:
1)

a) With "Encoding.Default" you retrieve system’s default code page.
But if windows has numerous code pages, then what exactly would
default page be, meaning where ( or in what apps ) does windows use
this default page over other code pages?

It is usually code page 1252, which is very close to ISO-8859-1.

There must be some Windows setting to change it, but I don't
know where.

b) Can code page support Unicode coded character set, but may use
different encoding than Unicode does ( Unicode set uses three
encodings - UTF-8, UTF-16 and UTF-32 )?

If you need unicode support, then you should use UTF-8.

I believe UTF-8 has a code page number as well, but in .NET
you call it UTF-8.

(wikipedia says 65001)

c)

* Are there also 8-bit code pages which use Unicode character
encoding, and thus have only 255 code points matched to characters?

That is what all the regular code pages do.

* Can these code pages also use UTF-16 or UTF-32 encoding?

* Are there also code pages that support more than 255, but less than
2^16 code points?

I don't think so.

BTW, in western countries ANSI/ISO-8859-1/CP1252 and UTF-8 is
what you use.

3) I noticed there are only four classes derived from Encoding class
( ASCIIEncoding, UTF8Encoding, UnicodeEncoding and UTF7Encoding ).What
if you want to use some other, non-unicode encoding?

Encoding.GetEncoding(codepage)

5)
a) “Internally, the .NET Framework stores text as Unicode UTF-16.”

I assume that the above quote is only referring to String objects and
char variables using UTF-16 encoding,

UTF-8 and ANSI are external formats.

Internally all string and char uses 16 bit unicode.

b) Ignoring the fact that FE FF sequence identifies the type of
encoding, does U+FEFF also represent a character ( outside the context
of encoding )?

I believe the 16 bit exist as a unicode code point but not as
UTF-8.


6)
Say app1 ( running on PC1 ) and app2 ( running on PC2 ) communicate
via network using TCP/IP protocol. PC1 uses little endian-order, while
PC2 uses big-endian order. Now, I know we send information over TCP/IP
( and networks in general ) using big-endian order, but:

a) But does only data in the packet’s header uses this byte order,
while application data is sent just as it is, without reversing its
byte order ( assuming this data is sent over the network by PC1 )?

Network transport protocols usually agree on endianess for
header info - typical they use network order which is big endian.

For application data (payload) your app needs to handl eit.

b) If so, then if PC1 sends some .exe file to PC2, then how will PC2
know whether it came from little endian-machine and thus should
reverse bytes before trying to load this .exe file?

This is payload and will not be reversed.

And why should it.

If you upload a PC EXE from a little endian Windows x86 to
a big endian Solaris SPARC, then the EXE would not run
anyway.

Arne
.



Relevant Pages

  • Re: "env" parameter to "popen" wont accept Unicode on Windows -minor Unicode bug
    ... Unicode to be handled automatically. ... Windows, and it knows what encoding Windows needs for its environment ... So the current code will handle Win9x, ...
    (comp.lang.python)
  • Re: C# and encodings
    ... But if windows has numerous code pages, ... encoding, and thus have only 255 code points matched to characters? ... Unicode can't be represented in only 8-bits, ... But Notepad supports Unicode and yet it only recognizes 255 character, ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: "env" parameter to "popen" wont accept Unicode on Windows -minor Unicode bug
    ... Unicode to be handled automatically. ... Windows, and it knows what encoding Windows needs for its environment ... the distinction between windows and other platforms is debatable. ...
    (comp.lang.python)
  • Re: i18n: looking for expertise
    ... On Mac OS X, the encoding is ``utf-8''. ... On Windows NT+, file names are Unicode natively, so no conversion is ... It looks like you don't need to do any encoding of filenames however, ...
    (comp.lang.python)
  • Re: Unicode Strings in Postscript
    ... >a Unicode string to a double byte (In windows using ... >)specifying some script in it gives me the double ... It sounds as if you may be using little endian UCS2/UTF16 (which is ... what Windows "Unicode" means) with a CMap meant for big endian ...
    (comp.lang.postscript)

Loading