XML parsing speeds

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Well, very interesting stuff, trying different syntax and mechanics!

This is based on a tiny XML message, as seen in recent messages from me.
Results on larger (2+ megabyte) will probably be very different - I saw a 60x
difference on one the other day, and in these tests, nothing of that
magnitude.

But I saw some very curious things.

Kent Tegels showed me how to even touch the XML in a table, and a million
thanks to him for that! He also supplied some nice xpath syntax that
minimized the cross applies I was throwing around madly. Kent's code is also
well-behaved!

select t1.c1.value('(../head/msgid/text())[1]','int') as msgID
,t1.c1.value('(./tid/text())[1]','nvarchar(8)') as tid
,t1.c1.value('(./tmsg/text())[1]','nvarchar(8)') as tmsg
from @xmsg.nodes('//text') as T1(c1)

select t1.c1.value('(../head/msgid/text())[1]','int') as msgID
,t1.c1.value('(./tid/text())[1]','nvarchar(8)') as tid
,t1.c1.value('(./tmsg/text())[1]','nvarchar(8)') as tmsg
from #mymsg
cross apply msg.nodes('//text') as T1(c1)

Run from a variable: 8,773ms
Run from an unindexed table: 12,480ms
Run from a table with primary index: 7,600ms
Run from a table with secondary path index: 5,396

Nice!

This is from a temp table, run on my little single-core laptop, in a loop
executing 10,000 times, the null loop time about 500ms.

Now, it's curious that the time from a variable is faster than an unindexed
table and slower than an indexed table, but OK, we can live with that!

Now, I also ran this against various versions of my klutzy (but fully
specified) path code, and I'll just list results for the best of them.

SELECT
T.rows.value('(head/msgid)[1]','VARCHAR(8)') as msgid,
T2.rows.value('(tid)[1]','VARCHAR(8)') as tid,
T2.rows.value('(tmsg)[1]','VARCHAR(8)') as tmsg
FROM @xmsg.nodes('/msgs/msg') as T(rows)
CROSS APPLY T.rows.nodes('text') as T2(rows)

SELECT
T.rows.value('(head/msgid)[1]','VARCHAR(8)') as msgid,
T2.rows.value('(tid)[1]','VARCHAR(8)') as tid,
T2.rows.value('(tmsg)[1]','VARCHAR(8)') as tmsg
FROM #mymsg
cross apply msg.nodes('/msg') as T(rows)
CROSS APPLY T.rows.nodes('text') as T2(rows)

Run from a variable: 5,530ms
Run from an unindexed table: 5,256ms
Run from a table with primary index: 6,200ms
Run from a table with secondary path index: 6,280ms

Huh?!?

Whatever the magnitude of the numbers, I'm somewhat amazed that it ran
better from a table with no indexes - at least certainly no worse. There's
maybe a five percent noise factor in these timings, near as I can make out.

Any comments anybody has on any of this are welcome.

Oh, my "bad" code was using a funny syntax I picked up somewhere, that
always ran slower - and was the source of the 60x slower problem I had the
other day:

SELECT
T.rows.query('head/msgid').value('.','VARCHAR(8)') as msgid,
T2.rows.query('tid').value('.','VARCHAR(8)') as tid,
T2.rows.query('tmsg').value('.','VARCHAR(8)') as tmsg
FROM @xmsg.nodes('/msgs/msg') as T(rows)
CROSS APPLY T.rows.nodes('text') as T2(rows)

Even when the values are specified to the leaf, apparently there can be a
lot of overhead in a cross apply if (the first, any one, too many?) of the
operands are specified as sets rather than scalars. Use the [1] syntax,
folks! The numbers for this were on the order of 2x slower than my good
syntax, for this tiny XML. Again, for the larger XML, it was returning
results 60x slower.

Final note, that the XML in the table was shredded to the first level in
being inserted into the table, so the results are not entirely comparable,
table speeds are perhaps a tad optimistic due to this - speaks even better
for working out of a variable.

Thanks to all who helped me get this far.

Josh


.



Relevant Pages

  • Re: Be afraid of XML
    ... whole of the JVM into XML). ... anything you do with s-expr is just XML with minor syntactic differences ... XML is a general tree syntax suitable for world-wide use. ... same syntax family as an enormously popular document markup language. ...
    (comp.lang.lisp)
  • new notation idea: SMXL...
    ... and XML, while combining some of the capabilities of both. ... The goal will involve a mild simplification of XML syntax, ... Will be considered an SNode or SForm. ... Will create an XNode, which may contain expressions. ...
    (comp.lang.misc)
  • Re: Data table text I/O package?
    ... > in XML, too, if you think this is an argument ...) ... >> Though is it about what syntax would be the best? ... but a huge readability loss. ... Distance isn't a record. ...
    (comp.lang.ada)
  • Re: bobcat
    ... could become less convenient than \section{Section heading}. ... Is it correct that the abstract syntax tree is the equivalent ... An XML backend will be trivial: ...
    (comp.text.tex)
  • Re: Extract and Insert Xml nodes
    ... Here is an update statement that should work (only tested with one XML instance in each column): ... WITH T1mid as ( ... FROM tab1 cross apply x.nodesas N ... FROM tab2 cross apply x.nodesas N1cross apply g.nodesas N ...
    (microsoft.public.sqlserver.xml)