Re: XML performance extremely slow for no obvious reason

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Dr Pizza,

I think can shed a little light on this, at least. Some of this has
nothing to do with XML, but with the inability of the optimizer
to deal with parameters like your @col1 that cause there to
be either 1 result or 10000, depending on whether @col1 is
null or not. But XML is also an issue, because there, too,
the optimizer can't deal, but you can do less about it, I think.

To evaluate the cost of each available execution plan, the optimizer
has to estimate the number of rows of [data] satisfying the condition

where (id = @id or @id is null)
and
(
@idlist is null
or
id in
(
select ParamValues.id.value('.','uniqueidentifier')
from @idlist.nodes('/idlist/id') as ParamValues(id)
)
)
and
(col1 like @col1 or @col1 is null)
and ...



In one case you have 3 "like or null" conditions. In the other case
you have 9 of them. In both cases, these conditions are joined
with ANDs.

Because these conditions are not SARGs, the optimizer has little to
go on when it estimates these row counts. The SQL Server optimizer
uses the following heuristic: assume that any non-SARG is satisfied by
about 1/3 of the rows being checked, and assume the conditions are
independent.

With three conditions to meet, the optimizer estimates the row count
at about 1/27 of the rows. With nine, however, it estimates the row
count at 1/20000 of the rows. (No matter how low the estimate, it
will always estimate that one row or more meets all conditions.)
For a 10000-row table, this is the difference between about
370 rows and 1 row.

For both queries, one possible plan is a nested join, where the upper
input is the qualifying rows of [data]. The lower input will be repeated
as many times as there are rows in the upper input.

In the lower input for this potential execution plan contains two TVFs
that have to do with the XML - probably one for value() and one
for nodes(). Though there's a nested loop join shown, I don't think
anything really multiplies right here. For each iteration of the (outermost)
nested loop join, the number of times these TVFs are called looks
to be once for each item in the XML variable (50 times in your
case). Nothing is known about the contents of this XML variable
at optimization time, and another generic estimate is used: I think the
estimate is 900.

As it turns out, your 9 conditions are satisfied by all the rows of
the [data] table, and so instead of the estimated 900 table-valued
function calls (1 row in the top input and 900 calls for that row),
there are 500000 (10000 rows in the top input, and 50 function
calls per row). So the optimizer underestimated the cost of this
part of the query by a factor of 600 and picks this plan instead
of the one that is actually faster, given your data.

What can you do? Unfortunately, you might not be able to do
much. While I was able to get the optimizer to think there
were 10000 matching rows for the big condition (by adding
some ORed conditions that were never true but not SARGs,
like CHECKSUM(id)%2 > 2 ), other optimizer guesses
I couldn't change - those having to do with the XML.

I don't know the context for your post, but if you have enough
control over the @idlist that you can deal with it without using
XML, I think you'll be better off. You could either put the id
values into a table before you use them:

select <the ids> into #ids from <the xml>
select count(*) from ... join #ids on ...
[assuming you can do this in two steps]

or you could require that they come to you in non-XML format,
packed together so that you can pull apart with a table of numbers.

declare @ids varbinary(max)
set @ids = cast(@guid1 as binary(16)) + cast(@guid2 as binary(16)) + ...
-- be sure not to cast the string representations instead of the uniqueidentifier

-- if the @ids come in this way, this may be much better:
select count(*) from ... join (
select cast(substring(@ids,n*16-15,16) as id
from permanentTableOfNumbers
where n between 1 and datalength(@ids) / 16
) IDs
on IDs.id = data.id
and <other conditions>

There may be much simpler ways to handle this, but I hoped
my rambling was interesting, if not useful...


-- Steve Kass
-- Drew University
-- http://www.stevekass.com
-- 7A5F4524-8E12-4FB7-9341-E4CBE7BFA002




Dr Pizza wrote:

Hi.

I'm using SQL 2005 (SP1 and SP2, doesn't make a difference) and am
seeing some very strange behaviour.

Here's a script to repro the issue:

------ BEGIN SCRIPT ------

if exists (select * from sys.objects where object_id =
object_id(N'[dbo].[data]') and type in (N'U'))
drop table data
go

create table data
(
id uniqueidentifier not null,
col1 nvarchar(255) null,
col2 nvarchar(255) null,
col3 nvarchar(255) null,
col4 nvarchar(255) null,
col5 nvarchar(255) null,
col6 nvarchar(255) null,
col7 nvarchar(255) null,
col8 nvarchar(255) null,
col9 nvarchar(255) null
)
go

if exists (select * from sys.objects where object_id =
object_id(N'[dbo].[counter]') and type in (N'U'))
drop table counter
go

create table counter
(
value int not null
)
go

insert into counter(value) values (0)
insert into counter(value) values (1)
insert into counter(value) values (2)
insert into counter(value) values (3)
insert into counter(value) values (4)
insert into counter(value) values (5)
insert into counter(value) values (6)
insert into counter(value) values (7)
insert into counter(value) values (8)
insert into counter(value) values (9)
go

insert into data(id)
select newid() from counter thousands cross join (counter hundreds cross join
(counter tens cross join counter units))
go

declare @idlist xml
set @idlist =
(
select tag, parent, [idlist!1!!ELEMENT], [id!2!!ELEMENT] from
(
select 1 as tag, null as parent, null as [idlist!1!!ELEMENT],
null as [id!2!!ELEMENT]
union all
select top 50 2 as tag, 1 as parent, null, id
from data
) as spuriousname
for xml explicit, type
)

declare @id uniqueidentifier declare @col1 nvarchar(255)
declare @col2 nvarchar(255)
declare @col3 nvarchar(255)
declare @col4 nvarchar(255)
declare @col5 nvarchar(255)
declare @col6 nvarchar(255)
declare @col7 nvarchar(255)
declare @col8 nvarchar(255)
declare @col9 nvarchar(255)

select count(*) from data
where (id = @id or @id is null)
and
(
@idlist is null
or
id in
(
select ParamValues.id.value('.','uniqueidentifier')
from @idlist.nodes('/idlist/id') as ParamValues(id)
)
)
and
(col1 like @col1 or @col1 is null)
and
(col2 like @col2 or @col2 is null)
and
(col3 like @col3 or @col3 is null)
-- comment out the rest of these conditions to see the speed
-- improve significantly and the execution plan looks much
-- better too
and
(col4 like @col4 or @col4 is null)
and
(col5 like @col5 or @col5 is null)
and
(col6 like @col6 or @col6 is null)
and
(col7 like @col7 or @col7 is null)
and
(col8 like @col8 or @col8 is null)
and
(col9 like @col9 or @col9 is null)

------ END SCRIPT ------

Now if you run that script and look at the execution plan for the
final select query, there will be a nested loop join taking in data
from the XML. That nested loop join will have to read in about a
million rows from one side of the join, and about two million from the
other, and the entire query executes quite slowly. Where these
millions of rows come from I don't know; the table has 10,000 rows,
and there are 50 IDs in the XML variable.

Then run the script, but without the extra conditions at the bottom
(i.e. remove eveything after the comment). Notice how this time the
nested loop join *doesn't* have to go through millions of rows, and
the performance is considerably better.

What's going on here? Why does adding the extra conditions make the
performance change from decent to unacceptable? It feels like
something's going wrong, but I don't know what.


.



Relevant Pages

  • can openxml write multiple fields - 1 row?
    ... quantity attributes from the XML document. ... declare @doc varchar ... FROM OPENXML ... This routine only generates one int ...
    (microsoft.public.sqlserver.xml)
  • can openxml write multiple fields - 1 row?
    ... I got an xml routine from the Sql Server NG where I can ... delimiters in the string. ... Declare @iDoc Int ...
    (microsoft.public.sqlserver.xml)
  • Re: Shredding XML
    ... to determine which field values should be extracted from an XML doc ... The stored procedure works, but the process of dynamically gathering ... DECLARE keycolumns_cursor CURSOR LOCAL ... DECLARE @xPath varchar ...
    (microsoft.public.sqlserver.xml)
  • Re: XML Update performance
    ... If I need up update 3-4 values in the xml document at one time(in one ... declare @start datetime ... create xml index pathIdx on dbo.docsusing xml index pxIdx for path ...
    (microsoft.public.sqlserver.xml)
  • Re: Attempt to de-mystify AJAX
    ... you could declare all your variables as Variant. ... the browser, and I still think of that as a mixed blessing. ... XML, which is far from being the most efficient data representation ... done as a fat client, ...
    (comp.databases.pick)