Re: Byte size of characters when encoding
From: mikeb (mailbox.google_at_nospam.mailnull.com)
Date: 07/09/04
- Next message: Nick Parker: "RE: How to wait for at file to be closed....?"
- Previous message: jcjj: "RE: How to wait for at file to be closed....?"
- In reply to: Vladimir: "Byte size of characters when encoding"
- Next in thread: Vladimir: "Re: Byte size of characters when encoding"
- Reply: Vladimir: "Re: Byte size of characters when encoding"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 09 Jul 2004 14:36:54 -0700
Vladimir wrote:
> Method UnicodeEncoding.GetMaxByteCount(charCount) returns charCount * 2.
> Method UTF8Encoding.GetMaxByteCount(charCount) returns charCount * 4.
>
> But why that?
Strings in .NET are already Unicode encoded. So if you encode the
string to an array of bytes, you get bytes per character.
However, for UTF8 encoding a single Unicode character can be encoded
using up to 4 bytes in the worst case. charCount*4 is just a worst case
scenario if the string happened to contain only characters that required
4 byte encoding.
>
> Look:
>
> /*
> Each Unicode character in a string is defined by a Unicode scalar value,
> also called ...
>
> An index is the position of a Char, not a Unicode character, in a String. An
> index is a zero-based, nonnegative number starting from the first position
> in the string, which is index position zero. Consecutive index values might
> not correspond to consecutive Unicode characters because a Unicode character
> might be encoded as more than one Char. To work with each Unicode character
> instead of each Char, use the System.Globalization.StringInfo class.
> */
>
> With UTF-8 encoding one instance of struct Char can only occupy 1/2, 1, 1
> 1/2, 2 bytes?
> Isn't it?
> Therefore UTF8Encoding.GetMaxByteCount(charCount) must returns charCount *
> 2.
> Because charCount means count of instance of struct Char.
> Or not? May be it means count of Unicode characters?
> If not, then UnicodeEncoding.GetMaxByteCount(charCount) must returns
> charCount * 4.
>
> This methods does not fit each other.
>
>
-- mikeb
- Next message: Nick Parker: "RE: How to wait for at file to be closed....?"
- Previous message: jcjj: "RE: How to wait for at file to be closed....?"
- In reply to: Vladimir: "Byte size of characters when encoding"
- Next in thread: Vladimir: "Re: Byte size of characters when encoding"
- Reply: Vladimir: "Re: Byte size of characters when encoding"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|