Re: XML performance extremely slow for no obvious reason

Tech-Archive recommends: Fix windows errors by optimizing your registry



On Sat, 03 Mar 2007 22:57:07 -0500, Steve Kass <skass@xxxxxxxx> wrote:

I think can shed a little light on this, at least. Some of this has
nothing to do with XML, but with the inability of the optimizer
to deal with parameters like your @col1 that cause there to
be either 1 result or 10000, depending on whether @col1 is
null or not. But XML is also an issue, because there, too,
the optimizer can't deal, but you can do less about it, I think.

Further playing around suggests the XML makes it both better and worse,
depending.

I bumped up the data table to 100,000 rows, just to make issues more
apparent.

The reason behind the XML is to work around SQL Server's lack of arrays;
this is a stored procedure called through ADO.NET and the original
developer felt XML was the easiest way to pass multiple values in to the
SP. Seems fair enough, I think.

The example code generates the XML on the fly, of course. One thing I
tried out of curiosity was to replace the XML @idlist with a simple
subselect:

where (id = @id or @id is null)
and
(
@idlist is null
or
id in
(
select top (50) id from data
)
)

with all the other conditions removed. With the conditions removed, the
XML version is now considerably *faster* than with subselect above. Not
what I expected at all! The XML version takes under two seconds; the
subselect (which is equivalent, but without the conversion from XML to
tables) takes 33 seconds. A check of the execution plan shows that yet
again it's multiplying the table size and then having to read the best
part of 100 million rows. Now as far as I can tell, that should never be
the case. Reading 50 rows from a table should be quicker than reading 50
rows from XML, and given that that's the only difference between the two,
there's no good reason for the execution plans to differ (IMO).

The next thing was to try using a temp table:

select ParamValues.id.value('.','uniqueidentifier') as id
into #ids
from @idlist.nodes('/idlist/id') as ParamValues(id)

....

where (id = @id or @id is null)
and
(
@idlist is null
or
id in
(
select id from #ids
)
)

This seems to be far more consistent. The version with no other
conditions takes a couple of seconds, and even with all the conditions
added, it doesn't take more than about 12 seconds. And importantly,
there's no explosion in the number of rows scanned; it consistently scans
the right number of rows, no matter what.

There may be much simpler ways to handle this, but I hoped
my rambling was interesting, if not useful...

Thanks. The behaviour still seems downright weird (and totally
undesirable) but the workaround is satisfactory.
.



Relevant Pages

  • Re: XML performance extremely slow for no obvious reason
    ... But XML is also an issue, because there, too, ... the optimizer can't deal, but you can do less about it, I think. ... Though there's a nested loop join shown, ... declare @ids varbinary ...
    (microsoft.public.sqlserver.xml)
  • Re: XML in stored proc and execution plan
    ... I have not checked whether procedures with OPENXML in them are recompiled, ... My guess is that the XML is opaque to the ... optimizer, which thus may some standard guess, if you join the OPENXML ... If you want a recompile, at least in case of big XML documents, you could ...
    (microsoft.public.sqlserver.programming)
  • Can you reuse nested XML subqueries?
    ... SELECT Col1, ... FOR XML AUTO, TYPE) ... I figured out that you can write a scalar function that returns an xml type ... My first question is: Do UDF's have a performance penalty in this case? ...
    (microsoft.public.sqlserver.xml)
  • Re: Remove trailing char in concatenated rows
    ... FOR XML PATH) AS D (TEXTURES) ... ORDER BY col1; ... Plamen Ratchev ...
    (microsoft.public.sqlserver.programming)