Re: XmlTextreader versus DOM
From: Bruce Wood (brucewood_at_canada.com)
Date: 11/30/04
- Next message: Andrew Jacobs: "Re: Performance of XmlDocument.ImportNode"
- Previous message: Oleg Tkachenko [MVP]: "Re: SelectSingleNode with multiple namespaces (sort of)..."
- Next in thread: Amol Kher [MSFT]: "Re: XmlTextreader versus DOM"
- Reply: Amol Kher [MSFT]: "Re: XmlTextreader versus DOM"
- Messages sorted by: [ date ] [ thread ]
Date: 30 Nov 2004 12:38:57 -0800
Even outside the .NET world, there have been for some time two ways to
read XML. I've heard them referred to as "DOM" and "SAX".
"DOM" (short for "Document Object Model") parsers read the entire XML
document and build a representation of it as a hierarchy of objects in
memory. There are DOM parsers for Java, C, C++, and other languages as
well as the ones built into .NET.
DOM is, generally speaking, the easiest way in which to deal with XML
documents, but it has the disadvantage that it loads the entire
document into memory, which can be a problem if you have a
many-megabyte document.
If you are reading XML into ADO.NET you really don't have any choice
but to use DOM in some form because all of MS's automated
XML-to-ADO.NET tools read the entire document into a dataset.
"SAX" (named after the original parser, I think) parsers read one XML
token at a time. You supply callback methods that the parser should
call when it encounters certain kinds of things in the document. For
example, "Call this method when you find an attribute called
"Address". SAX parsers are extremely resource efficient, because they
read only one XML element at a time. However, they leave it up to the
calling application to maintain state. When your "Address attribute"
method is called, you have no idea where in the document you are, only
that you hit an attribute called "Address". For this reason
programming for SAX parsers can be a pain in the ***.
MS claims that XmlTextReader improves upon the SAX parser, but I
remember thinking that it really wasn't a leap forward in technology,
back when I was investigating .NET's XML support.
I ended up writing what I consider the best balance between SAX and
DOM, and something that I wish MS (and Java, and ... ) would include
in their standard libraries: a sequential, forward-only parser that
reads an XML document's repeating record content one DOM tree at a
time.
In brief, there are two kinds of XML documents: those that represent
documents with little or no repeating structure (such as MS Word
files). For these you use DOM. However, many XML files represent large
record sets, where each "record" has complex substructure. DOM is
overkill for these, because you don't need all of the records in
memory at once: you're processing them serially, one-by-one. SAX,
however, is too simplistic and makes it difficult to work with each
record. What you really want is a parser that, given some information
about what constitutes a "record" in your XML document, reads one
"record" at a time into a mini DOM tree.
This is what I built for our own use here, and it works well. It reads
only a small portion of an XML file into memory at one time, but each
portion comes in as a DOM tree that is easy to work with.
- Next message: Andrew Jacobs: "Re: Performance of XmlDocument.ImportNode"
- Previous message: Oleg Tkachenko [MVP]: "Re: SelectSingleNode with multiple namespaces (sort of)..."
- Next in thread: Amol Kher [MSFT]: "Re: XmlTextreader versus DOM"
- Reply: Amol Kher [MSFT]: "Re: XmlTextreader versus DOM"
- Messages sorted by: [ date ] [ thread ]