RE: Object serialization and NetworkStream - extraneous characters in output

Tech-Archive recommends: Speed Up your PC by fixing your registry

From: jwallison (jwallison_at_nospam.net)
Date: 12/16/04


Date: Thu, 16 Dec 2004 10:37:45 -0500

My .Net socket test client WAS erroneously using Encoding.ASCII (ah, the
joys of midnight testing!), changing that to UTF8 produces the same result
that the Java developer is reporting - a "?" is received at the beginning
of every deserialized message on the socket.

So, the "o;" is the encoding information on the packet, but the "?" is
extraneous.

What is the source of the extraneous character, and can it/should it be
eliminated? I seem to recall something like this from the days of DOS - is
it just an artifact of Socket communications in general?

"Steven Cheng[MSFT]" <v-schang@online.microsoft.com> wrote in message
news:WnavcqM4EHA.768@cpmsftngxa10.phx.gbl...
> Hi Jim ,
>
> Thanks for your posting. From your description, you're using the dotnet's
> XmlSerializer to serialize a certain class instance out to a NetWorkStream
> and at the other side, when you retrieve the stream and try reading the
> xmlcontent out, you found there is an additional header "o;?" at the
> begining of the xml stream ,yes?
>
> As for the problem you mentioned, I think it is likely due to the encoding
> problem. First, as for UNICODE text stream, there will has a header which
> indicate the Unicode stream's encoding type. And "o;?" is the one for
> UTF-8, and when using other ones such as UTF-16, you will get other value(
> ASCII stream won't have such a header). To verify this, you can also use
a
> UltraEdit to open a unicode(UTF-8) txt file and use hex format to see it,
> you'll found the header, it is composed of three bytes 239,187,191 ,
> they're all ascii char, and will display as "o;?" if you print them as
> ascii string. For example:
>
> byte[] bytes = {239,187,191};
> MessageBox.Show(System.Text.Encoding.ASCII.GetString(bytes));
>
> So, when you use XmlSerializer to serialize an object into a certain
> stream, if using Unicode encoding type(use UTF-8 for instance), the header
> will be added( the first three bytes). However, if you read the xml back
> from the stream via UTF-8 encoding, you won't get this three bytes, the
> UTF-8 encoding system will automatically remove the header and return the
> sequential bytes bebind the header. Here is a simple code snippet to show
> this:
>
> ===============================
> byte[] buffer = null;
>
> XmlSerializer serializer = new XmlSerializer(typeof(userInfo));
>
> userInfo ui = new userInfo();
> ui.userName = "steven cheng";
> ui.age = 20;
> ui.email = "steven@microsoft.com";
>
> MemoryStream ms = new MemoryStream();
>
> StreamWriter sw = new StreamWriter(ms,System.Text.Encoding.UTF8);
>
> serializer.Serialize(sw,ui);
>
> buffer = ms.GetBuffer();
>
> // will return the xml with "o;?" because we use ASCII to decode the byte
> which is incorrect
> MessageBox.Show(System.Text.Encoding.ASCII.GetString(buffer));
>
> // won't display the "o;?" since the UIF-8(correct encoding) will bypass
it
> MessageBox.Show(System.Text.Encoding.UTF8.GetString(buffer));
> ==================================
>
> So, If you found the problems occur in your java client that recieve this
> stream, I suggest you check the java code to see whether it is reading the
> stream and conver the bytes to string using the correct encoding
> type(utf-8). I suspect that it is using the default ASCII encoding to
read
> the bytes so that the "o;?" come out.
>
> Please have a look at the above things, if there is anything unclear,
> please feel free to post here.
> HTH.
>
> Regards,
>
> Steven Cheng
> Microsoft Online Support
>
> Get Secure! www.microsoft.com/security
> (This posting is provided "AS IS", with no warranties, and confers no
> rights.)
>
>
>
>

-- 
Regards,
Jim Allison
jwallison@nospam.net


Relevant Pages

  • RE: Object serialization and NetworkStream - extraneous characters in output
    ... when you retrieve the stream and try reading the ... you found there is an additional header "o;?" ... As for the problem you mentioned, I think it is likely due to the encoding ... ASCII stream won't have such a header). ...
    (microsoft.public.dotnet.framework)
  • Re: Object serialization and NetworkStream - extraneous characters in output
    ... > and at the other side, when you retrieve the stream and try reading the ... > As for the problem you mentioned, I think it is likely due to the encoding ... > ASCII stream won't have such a header). ... If you found the problems occur in your java client that recieve this ...
    (microsoft.public.dotnet.framework)
  • Re: Object serialization and NetworkStream - extraneous characters in output
    ... > and at the other side, when you retrieve the stream and try reading the ... > As for the problem you mentioned, I think it is likely due to the encoding ... > ASCII stream won't have such a header). ... If you found the problems occur in your java client that recieve this ...
    (microsoft.public.dotnet.framework)
  • Re: Reading text files with java.nio.*
    ... > probably guess I am not a great java programmer!) ... text file in UTF-16BE encoding, this won't be what you want. ... you could treat System.out as a byte stream and don't do ...
    (comp.lang.java.programmer)
  • [encoding]Stupid question regarding encoding
    ... I have written a java plugin for an application that reads a file to execute ... there is problem as the line containing "#HEADER" is not found! ... where line is read from the text file with encoding as is on the machine ...
    (comp.lang.java.help)