Re: Problem with encoding a character

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



On Tue, 15 Sep 2009 12:27:49 -0700, David <david.colliver.NEWS@xxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hi Peter,

That code sample is a full sample, apart from the URL and sign-in details. I
am not sure what else I can show.

Here are some links you may find helpful:
http://www.yoda.arachsys.com/csharp/complete.html
http://www.yoda.arachsys.com/csharp/incomplete.html
http://sscce.org/ (some Java-centric stuff, but mostly applicable to any programming questions)

It is actually failing in the finally, the myWriter.Close().

As the references above will point out, part of asking a good question is providing accurate information regarding what's failing.

I am also speaking with the SMS people to come to a solution. I have figured
that it is down to the byte length, not the character length. Basically, the
pound symbol is looking like 2 bytes instead of 1. I know this appears to be
happening, as when I change the ContentLength by adding one (+ 1), the pound
symbol goes through.

In other words, your problem very well may be caused by the problem I noted in my previous post: that you are setting the ContentLength property incorrectly by providing character length instead of byte length.

It was the text in the error message that gave me that clue and also the
stepping through and seeing it fail on the finally, without even being
caught by the catch. With the error message stating that it can't close
until all bytes are sent, it was the realisation that the pound sign £ must
be two bytes instead of one, as without the symbol, everything worked hunky
dory.

I thought the pound sign was a unicode character, but when I tried to change
it to unicode in the opening of the myWriter, it still failed.

The pound sign, in and of itself, neither is nor is not a Unicode character. It is simply a specific glyph, referenced in some character encodings as a specific character code. Some character encodings include the character, some do not. If you want to represent the pound sign, you have to use an encoding that supports that character. And yes, in some of those encodings (the Unicode encodings, for example) the pound sign will be more than one byte.

I don't know what you mean by "tried to change it to unicode". Change what to Unicode? The default for StreamWriter is UTF-8 anyway. What else was there to "change"?

If the SMS people come back with a solution or I work it out, I will post it
for reference.

It seems to me that the immediate problem -- the exception -- is obvious: you were not setting the ContentLength property correctly. You can fix that problem by handling the encoding yourself (see the Encoding class) so that you know exactly how many bytes your message will use, and then just writing those bytes directly rather than using the StreamWriter class (for example, as an alternative to using the Encoding class directly, write to a MemoryStream using StreamWriter, before you need to set the ContentLength property, and then write the bytes from the MemoryStream out to your request Stream).

As for whether you are able to get your recipient to understand an encoding that includes the pound sign (e.g. UTF-8, the default for StreamWriter), that would be an issue specific to the recipient, and not really a C#/.NET issue at all.

Pete
.



Relevant Pages

  • Re: C# and encodings
    ... different encoding than Unicode does (Unicode set uses three ... Any character encoding that is not Unicode by definition uses a different encoding than Unicode does. ... The point is that the Unicode "character" 0xfeff is not representable in any ANSI code page, and is treated specially by stripping it from input rather than replacing it with the "default character". ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: C# and encodings
    ... But if windows has numerous code pages, ... encoding, and thus have only 255 code points matched to characters? ... Unicode can't be represented in only 8-bits, ... But Notepad supports Unicode and yet it only recognizes 255 character, ...
    (microsoft.public.dotnet.languages.csharp)
  • C# and encodings
    ... Can code page support Unicode coded character set, ... Are there also 8-bit code pages which use Unicode character ... encoding, and thus have only 255 code points matched to characters? ... mark written in UTF-8. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: C# and encodings
    ... different encoding than Unicode does ... encoded into a binary stream using an encoding that either supports the ... So if code page supports only a subset of Unicode character set… ... characters as those in Unicode coded character set, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Try this
    ... Because that's the absence of encoding? ... If you want to understand what happens here: The Unicode block for 'CJK ... Unified Han' goes from U+4E00 to U+9FFF and is the largest block in the ... would collapse each two letters into a single character, ...
    (comp.lang.python)