Re: XML whackyness



oDOM.Load loads the file directly, whereas oDOM.LoadXML loads a string. You
might be using the wrong method, or this could be just a typo in your last
post.

It is essential you don't try and read the XML file manually and then pass
strings to the DOM to load. This would interpret the XML data as being in
the default ANSI character set rather than utf-8, and it would not handle
those magic bytes for you. You must get the DOM to load this XML directly
from the file itself, using the oDOM.Load method. This will then see those
magic bytes and know to treat the xml data as being in the utf-8 character
set.

Tony Proctor

"Frank Rizzo" <none@xxxxxxxx> wrote in message
news:#KS08Cy3FHA.1188@xxxxxxxxxxxxxxxxxxxxxxx
> Tony Proctor wrote:
>
> >How are you looking at the file Frank? Assuming these to be the special
> >3-byte sequence identifying the file content as UTF-8 then I would not
> >expect them to be visible in, say, Notepad. It automatically filters them
> >out and uses them to decide how the rest of the file data should be
> >interpreted (i.e. as UTF-8 rather than the default ANSI character set).
> >
> >Similarly, you don't need to do anything special when loading the XML
file
> >with MSXML since it interprets those bytes for you
> >
> >We rely on this feature a lot, but we've never had to filter anything out
> >ourselves. It all works pretty well for us.
> >
> >
> Well, if you actually read the file contents into a string and then pass
> it to xmlDoc.Load, it generates an error.
>
> >Out of interest, how are you generating these files?
> >
> >
> I am not generating them. They come from some entity outside of the
> company (don't know really).
>
> > Tony Proctor
> >
> >"Frank Rizzo" <none@xxxxxxxx> wrote in message
> >news:uUcL93l3FHA.1276@xxxxxxxxxxxxxxxxxxxxxxx
> >
> >
> >>Every now and then when I open an XML file, I'll see various
> >>miscallaneous characters before the first less-than bracket. Like today
> >>I've seen ascii 254 and 255, yesterday I saw . What are these
> >>characters? Are they garbage? Can I ignore them? The reason I am
> >>asking is because there is a routine that reads the text from XML file
> >>into memory and passes me the string. When the string has these whacky
> >>characters, domDocument.LoadXml method fails.
> >>
> >>Thanks
> >>
> >>
> >
> >
> >
> >


.



Relevant Pages

  • Re: Non-ascii characters in VS.NET service
    ... method that takes a string parameter. ... How is it turning the character into hex? ... What do you mean by "an XML header"? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Illegal Charaters in path
    ... I am downloading this file using ... Stripping the first character solved the problem though. ... I have a small XML file, I uploaded to a web page. ... XmlDocument.Load doesn't have an overlaod that loads XML from a string. ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: System.ArgumentException: Illegal characters in path
    ... But I don't use any xml string at all in my web ... It is a default data type string and I wonder it ... cannot accept latin character since string accepts all utf-8 characters. ... Microsoft XML 3.0 SP1 ...
    (microsoft.public.dotnet.framework.webservices)
  • RE: Xml deserialization problem..help needed.
    ... "The '*' character, hexadecimal value 0x2A, cannot begin with a name. ... set of characters...in the value of an xml element. ... I am deserializing the xml data into a c# class I have created. ... All I want to do is take a string of xmldata and deserialize it into a class. ...
    (microsoft.public.dotnet.framework.webservices)
  • Re: How to parse XML which contains & in the text ?
    ... "The ampersand character and the left angle bracket MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. ... bracket may be represented using the string ">", and MUST, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section." ... You can't fix this in the DTD, the XML is invalid and the parser is correct to reject it. ...
    (comp.lang.java.programmer)