Re: XmlDocument and utf-8

Tech-Archive recommends: Speed Up your PC by fixing your registry



On Jun 18, 3:29 am, stch...@xxxxxxxxxxxxxxxxxxxx (Steven Cheng[MSFT])
wrote:
Yes, both "UTF-8" and "utf-8" is ok for the charset in XML declaration
section. And the .net framework XmlDocument just always convert the charset
value to lower case for consistency purpose.

In addition, the <?xml ....?> declaration's charset value is only a
suggestion value for some XML processing programs, the actual
charset/encoding format of a XML document/file still rely on how you write
out the document(through file I/O api). In other words, the actual
charset/encoding of a XML file may be different from the charset
declaration in the <?xml ....?> section

It's not really a "suggestion" - it's the encoding which should be
used to parse the rest of the document. If you claim (in the
declaration) to use UTF-8 and actually use some other encoding, XML
parsers are almost certainly going to fail to understand the data in
the way you expect.

Jon

.



Relevant Pages

  • Re: minidom xml & non ascii / unicode & files
    ... And can i decode it to unicode and encode it back to a ... If there is no charset= attribute, ... XML declaration. ... they are the same "thing" only if the assumed encoding is the same. ...
    (comp.lang.python)
  • Re: what does "serialization" mean?
    ... I didn't notice that the first time around. ... message about how the declaration itself is already in the desired ... encoding, so the parser has to try it various ways to see what it ... So I went back to the chapter on XML in that book ("Visual Basic .NET, ...
    (comp.programming)
  • Re: ANC: tmlrss.tcl - process RSS newsfeeds for tclhttpd
    ... make sure it generates *legal* HTML such as ... ... When one gets the feed, which is in XML, over HTTP, encodings are sometimes done ... As TDOM's Expat parser reads the XML declaration, ... So that leaves me to do [encoding ...
    (comp.lang.tcl)
  • Re: XmlDocument and utf-8
    ... And the .net framework XmlDocument just always convert the charset ... suggestion value for some XML processing programs, ... Microsoft MSDN Online Support Lead ... where an initial response from the community or a Microsoft Support ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: UTF-8 encoding problem
    ... Declaration having the "encoding" attribute at the begining of file ... What I am saying is the "encoding" of your physical file is different then the logical file (the xml itself). ... It sounds like your physical file is UTF-8, while I'm concerned your logical file is whatever, where whatever is the text you blindly copied from an MSDN article. ...
    (microsoft.public.dotnet.languages.vb)