XML parsing speeds
- From: JRStern <JRStern@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 22 Oct 2007 12:11:00 -0700
Well, very interesting stuff, trying different syntax and mechanics!
This is based on a tiny XML message, as seen in recent messages from me.
Results on larger (2+ megabyte) will probably be very different - I saw a 60x
difference on one the other day, and in these tests, nothing of that
magnitude.
But I saw some very curious things.
Kent Tegels showed me how to even touch the XML in a table, and a million
thanks to him for that! He also supplied some nice xpath syntax that
minimized the cross applies I was throwing around madly. Kent's code is also
well-behaved!
select t1.c1.value('(../head/msgid/text())[1]','int') as msgID
,t1.c1.value('(./tid/text())[1]','nvarchar(8)') as tid
,t1.c1.value('(./tmsg/text())[1]','nvarchar(8)') as tmsg
from @xmsg.nodes('//text') as T1(c1)
select t1.c1.value('(../head/msgid/text())[1]','int') as msgID
,t1.c1.value('(./tid/text())[1]','nvarchar(8)') as tid
,t1.c1.value('(./tmsg/text())[1]','nvarchar(8)') as tmsg
from #mymsg
cross apply msg.nodes('//text') as T1(c1)
Run from a variable: 8,773ms
Run from an unindexed table: 12,480ms
Run from a table with primary index: 7,600ms
Run from a table with secondary path index: 5,396
Nice!
This is from a temp table, run on my little single-core laptop, in a loop
executing 10,000 times, the null loop time about 500ms.
Now, it's curious that the time from a variable is faster than an unindexed
table and slower than an indexed table, but OK, we can live with that!
Now, I also ran this against various versions of my klutzy (but fully
specified) path code, and I'll just list results for the best of them.
SELECT
T.rows.value('(head/msgid)[1]','VARCHAR(8)') as msgid,
T2.rows.value('(tid)[1]','VARCHAR(8)') as tid,
T2.rows.value('(tmsg)[1]','VARCHAR(8)') as tmsg
FROM @xmsg.nodes('/msgs/msg') as T(rows)
CROSS APPLY T.rows.nodes('text') as T2(rows)
SELECT
T.rows.value('(head/msgid)[1]','VARCHAR(8)') as msgid,
T2.rows.value('(tid)[1]','VARCHAR(8)') as tid,
T2.rows.value('(tmsg)[1]','VARCHAR(8)') as tmsg
FROM #mymsg
cross apply msg.nodes('/msg') as T(rows)
CROSS APPLY T.rows.nodes('text') as T2(rows)
Run from a variable: 5,530ms
Run from an unindexed table: 5,256ms
Run from a table with primary index: 6,200ms
Run from a table with secondary path index: 6,280ms
Huh?!?
Whatever the magnitude of the numbers, I'm somewhat amazed that it ran
better from a table with no indexes - at least certainly no worse. There's
maybe a five percent noise factor in these timings, near as I can make out.
Any comments anybody has on any of this are welcome.
Oh, my "bad" code was using a funny syntax I picked up somewhere, that
always ran slower - and was the source of the 60x slower problem I had the
other day:
SELECT
T.rows.query('head/msgid').value('.','VARCHAR(8)') as msgid,
T2.rows.query('tid').value('.','VARCHAR(8)') as tid,
T2.rows.query('tmsg').value('.','VARCHAR(8)') as tmsg
FROM @xmsg.nodes('/msgs/msg') as T(rows)
CROSS APPLY T.rows.nodes('text') as T2(rows)
Even when the values are specified to the leaf, apparently there can be a
lot of overhead in a cross apply if (the first, any one, too many?) of the
operands are specified as sets rather than scalars. Use the [1] syntax,
folks! The numbers for this were on the order of 2x slower than my good
syntax, for this tiny XML. Again, for the larger XML, it was returning
results 60x slower.
Final note, that the XML in the table was shredded to the first level in
being inserted into the table, so the results are not entirely comparable,
table speeds are perhaps a tad optimistic due to this - speaks even better
for working out of a variable.
Thanks to all who helped me get this far.
Josh
.
- Prev by Date: Re: So just why is this 60x faster?
- Next by Date: Complex Query to XML
- Previous by thread: Re: So just why is this 60x faster?
- Next by thread: Complex Query to XML
- Index(es):
Relevant Pages
|