Re: UTF-8 encoding in AJAX web application.



"Jon Skeet [C# MVP]" <skeet@xxxxxxxxx> wrote in message
news:1174475482.881336.295840@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Mar 21, 8:40 am, "Allan Ebdrup" <ebd...@xxxxxxxxxxxxxxx> wrote:
No - the driver will do that for you.

Is this done by detecting the UFT8 preamble? And the the driver converts
to
UCS-2? And if so how come the result is still in UTF-8 when I retrieve
the
data again?

It's done by passing in strings as the parameters. At that stage
there's no encoding involved (well, sort of - all .NET strings are
actually UTF-16, which is very similar to UCS-2, but you can
effectively ignore that). In particular, it is meaningless to say that
a string (as a System.String) is "UTF-8 encoded".

I don't get why you can't say a System.String is UTF-8 encoded? if the bytes
in the string have to be read with a UTF-8 encoding to make sense? Granted
you would like the string to be UTF-16, but the bytes in the string have to
be read with a UTF-8 encoding to get the right meaning.

Where does my UTF-8 encoded XML get translated to UTF-16, I've got the
contents in a CDATA section and I fetch it using:
---
string text = xmlDocument.SelectSingleNode("QuestionText/text()").Value
---
So does fetching the CDATA section's value like this actually translate from
UTF-8 encoding to UTF-16 because all strings in .Net are UTF-16? (I thought
it just byte-copied the contents of the CDATA section, therefore I would get
a UTF-8 encoded string in the "text" variable).
And because the string is translated to UTF-16 it gets stored correctly in
the database when I pass it as a parmeter to the database?
OR
The string that is the XML gets passed to my webservice as a parameter that
I load into an XML document. As mentioned earlier I set my website to use
UTF-8 when sending/recieving data, is the string actually transformed from
UTF-8 to UTF-16 (because all strings are UTF-16 in .Net), before it is
passed to my webmethod?

I'mean I have a UTF-8 encoded string when I send it on the wire, and
somewhere it gets translated to a UTF-16 string that is stored in the
database as UCS-2 (UTF-16 to UCS-2 and back is handled by the driver as I
understand it)
The bytes of my original UTF-8 string are not the same as the bytes stored
in the UCS-2 string in the database so they have to get translated
somewhere, I'm just trying to understand where this translation occurs.

I know I could just drop it because it works, but I too curious about what
actually happens.

Kind Regards,
Allan Ebdrup.


.



Relevant Pages

  • Re: The Register interview Nigel Brown
    ... performance isn't quite as good as string. ... Have you considered implementing a native UTF-8 ... than UTF-16 with European ... which does not include all Chinese characters. ...
    (borland.public.delphi.non-technical)
  • Re: MultibyteToWideChar not working properly?
    ... error when the input data string is not a valid ASCII-7 encoded string ... (ASCII-7 for example). ... My example tries to do so: I pass an input string encoded in UTF-8 ... You can't ask it to treat UTF-8 as ASCII-7 and expect that it will translate correctly. ...
    (microsoft.public.vc.mfc)
  • Re: Interpretation of extensions different from Unix/Linux?
    ... the use of UTF-8 in this way is the recommendation of the ARG. ... (UTF-8 is a problem of its own in Ada. ... a UTF-8 encoded string is a String. ... You can't enumerate roots in Windows, ...
    (comp.lang.ada)
  • Re: Unicode Delphi Win32 - which approach
    ... I like the backwards compatibility aspects of UTF-8 vs UTF-16. ... The first 256 Unicode characters map to the ANSI character set. ... entire stream> but calling an API 100 times in a loop I can imagine. ... and explicitly contextualise every string. ...
    (borland.public.delphi.non-technical)
  • Re: UTF-8 encoding
    ... I need to pass a UTF-8 encoded writer ... reading that file with the system's default encoding. ... String), but used elsewhere as if it were a StringBuffer. ... There's a very good reason that ...
    (comp.lang.java.programmer)

Quantcast