Re: Byte array to string



Franco, Gustavo <gustavo_franco[REMOVEIT]@hotmail.com> wrote:
> Be carrefull with System.Text.ASCIIEncoding.ASCII.GetString(b);
>
> If your codepage is different than English then the conversion could have
> problems (not always).
>
> I had this problem where I had a program running on Korea where the codepage
> where different and then the range of Extended ASCII characters supported
> are from 0 to 239, instead 0 to 255.

Careful here - there's no such encoding as "Extended ASCII". There are
various character encodings which *are* extensions to ASCII, but no one
"extended ASCII".

> I experimented this problem with some Windows installed on different
> languages.
>
> If you want get a string back you can use something like this.
>
> System.Text.Encoding.GetEncoding(1251).GetString(b); (1251 is English)

1251 is just *one* code page...

> You can see how the conversion fail if you go to Control Panel, Regional
> Settings, Advanced and on "Language for non-Unicode programs" put Japanese.
>
> The strange thing is: clearly it says: "Language for non-Unicode programs",
> I know .net is full Unicode.

Yes, but the point is that a byte array isn't an array of characters.
You need to know what encoding the bytes represent characters in, in
order to get from them to Unicode.

> Now, why using System.Text.ASCIIEncoding.ASCII fail?, I don't know that...

Because any byte values greater than 127 aren't ASCII.

> Really doesn't fail, but if you Encode a string with extended characters
> into bytes and Decoded again into string, you will get different results. I
> guess .net map the extended character to the near one supported for the
> codepage.

You're asking the encoding to deal with bytes which it can't handle -
that's why things go wrong.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.

--
Jon Skeet - <skeet@xxxxxxxxx>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
.



Relevant Pages

  • Re: Wonder Woman et al.
    ... (I'm not sure what happened to the encoding in your post, ... "extended ASCII" doesn't mean a damn thing. ... your identification were correct then a proper newsreader should handle it. ... The significant point is not the technical description of those characters ...
    (alt.usage.english)
  • Re: [help] display extended ASCII code
    ... > How to get Emacs to correctly display extended ASCII code? ... There is no specific encoding that is called "extended ASCII". ... in there, not printable characters. ...
    (comp.emacs)
  • String Literal version 0.60, String Literal 437 and... Surprise! String Variable
    ... the MS-DOS Extended ASCII characters (codepage 437) are included. ... Also, there's String Literal 437, which maps those extended ASCII ...
    (comp.fonts)
  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)