Re: Handling multiple schemas and large files in XML



MikeB wrote:

I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

If you deserialize an XML document with XmlSerializer then you get .NET objects held in memory. It is hard to tell how much memory a 50 MB document consumes, you will have to run some tests and of course you will also have to take into account what kind of systems the users of your application have. Nowadays they are selling PC systems with 3 GB of RAM so I wouldn't rule out completely that you can use XmlSerializer to deserialize your large XML.


Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

Note that with .NET 2.0 XmlTextReader is deprecated, you should create an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only maintaining a low memory footprint that way so it is the .NET XML API for parsing large XML documents.
You can however combine XmlReader and other APIs like XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET 3.5) to process the whole document with XmlReader but pass subtrees on to other APIs to have more comfort or power to extract the data you are looking for.
For instance with LINQ to XML you have XNode.ReadFrom
http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom.aspx
to consume a subtree.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
.



Relevant Pages

  • RE: XML documentation file name
    ... MSDN document on VB.NET Project Designer says "The Generate XML document ... XML documentation is automatically emitted into an XML ... Microsoft Online Community Support ...
    (microsoft.public.dotnet.languages.vb)
  • Re: XMLTextReader reading too many characters
    ... It was a very simple process to delete the extra tag. ... You do know about XML ... status isn't indicative of being capable of editing an XML document. ... If you claim there is a problem with XmlTextReader and an allegedly ...
    (microsoft.public.dotnet.xml)
  • Re: Handling multiple schemas and large files in XML
    ... head: the xml files are rather large: 50+ Mb and over 1 million lines. ... If you deserialize an XML document with XmlSerializer then you get .NET ... Other than that you are right, XmlReader works fast but forwards only ...
    (microsoft.public.dotnet.xml)
  • Re: Storing hierarchies from XML into relational tables
    ... You haven't made clear what sort of relational tables you plan to use to store the data from your hierarchy. ... Parent VARCHAR ... Given an xml document that represents a heirarchy: ...
    (microsoft.public.sqlserver.xml)
  • XPathNavigator SetValue wipes out XmlType
    ... I have an xml document that I want to go through and set the values on ... I can iterate the document and get my XmlType and XmlBaseType values just ... Xml and Program.cs to recreate the problem. ...
    (microsoft.public.dotnet.languages.csharp)

Loading