Re: Why is this

From: Steve Kass (skass_at_drew.edu)
Date: 10/21/04


Date: Thu, 21 Oct 2004 18:43:07 -0400

Troy,

  The additional conditions complicate the situation, and your
single-column indexes aren't as helpful as they could be for handling
queries with conditions on more than one column. Certainly one solution
would be to have a single index on (lastrecon, acctnum, deposit) [or
possibly on (acctnum, lastrecon, deposit), if acctnum = 'donotexist'
rows are much rarer than < '2004-01-01' rows]. Such an index would
cover the query and handle the restrictive condition immediately. The
optimizer should choose it, but you could resort to an index hint. An
index on (lastrecon, acctnum) or the reverse should also work in this
case, though the optimizer might not be as keen to use it.

  The problem is still bad rowcount estimates, but unfortunately, no
column statistics are infallible when it comes to conditions involving
two or more columns, so you can't fix the problem by fixing the
statistics. Statistics only indicate how common individual column
values are. I'll give one plausible scenario - without seeing the query
plan, I can't be precise, but I am pretty sure the additional index will
help. A clue that I'm close would be that the two plans use a different
index on the initial seek operator leading into the bookmark operator.

  Suppose there are many 'doesnotexist' rows in the table, and also many
rows with dates before '2004-01-01', but all the 'donotexist' rows have
recent lastrecon dates, and there are none from before this year. The
optimizer will expect to find the TOP 1 row you seek quickly, no matter
how it searches the data, because each condition alone is satisfied by
many rows of the table - but unfortunately the optimizer has no idea the
conditions are exclusive.

  Because you only have single-column indexes to support the two-column
WHERE condition, there is no perfect query plan. The actual plan
probably uses the lastrecon index, so that the first row found is the
TOP 1 .. in ORDER BY lastrecon. That index doesn't have the value of
acctnum, so every row from before 2004 has to be looked up in the table
to get the acctnum value. That lookup is the bookmark operator you see
in the query plan. It's an expensive operation, but worth trying if it
doesn't have to be done many times. Unfortunately, instead of finding
'donotexist' within the first few or few hundred lookups, all pre-2004
rows are looked up with no match, and the actual cost far exceeds the
estimated cost (I'm not sure if the bad rowcount estimate will show on
the operator leading into the bookmark operator or the bookmark operator).

  The reason TOP 5 goes faster is because while it was cost-effective to
use the index for TOP 1, even if it used quite a few bookmark lookups,
TOP 5, which uses 5 times as many, is not practical. That tips the
balance to a different query plan, perhaps one that uses the acctnum
index instead - maybe there were 3,000,000 pre-2004 rows looked up in
the bad actual plan for TOP 1, but only 5,000 'donotexist' rows are in
the database, and the worst case for this alternate plan is not so
terrible. With a single index to cover both conditions, you will only
use bookmark lookups for [deposit], and not even then, if you add
deposit to the index - you will find out immediately that there are no
qualifying rows if there is an index that can zero in on your two-column
joint WHERE condition.

  I hope that helps - if it's not possible for you to add an index, you
can use an index hint to specify the index the good query used, or even
specify the clustered index. Searching 4 million rows with a scan
instead of millions of separate bookmark lookups shouldn't take 4
minutes - the bookmark lookups are terrible because they are scattered
all over the place, and may hit each data page multiple times, quite
possibly causing all kinds of page faults, if the entire table doesn't
fit in cache.

SK

Troy Murphy wrote:

>I am trying to apply a solution to your terrific explanation:
>
>the query is:
>-select top 1 deposit from accthist where acctnum='doesnotexist' and
>lastrecon<'2004-01-01' order by lastrecon
>there is a nonclustered index on acctnum and lastrecon and the result set
>will be empty.
>This query takes 4:06 (4 minutes)
>changing the query to:
>-select top 5 deposit from accthist where acctnum='doesnotexist' and
>lastrecon<'2004-01-01' order by lastrecon
>This query takes less than 1 second
>I tried the command:
>-create statistics recons on accthist (acctnum,lastrecon) with fullscan
> The result is the same and the top 1 query takes over 4 minutes.
>the Execution Plan looks similar for both queries in that a bookmark lookup
>had a cost of 100% but the quicker query had an estimated row count more
>than double the slow query but had a signficantly smaller number of actual
>rows.
>
>
>"Steve Kass" <skass@drew.edu> wrote in message
>news:%23vSa18vtEHA.3112@TK2MSFTNGP10.phx.gbl...
>
>
>>Troy,
>>
>> My guess is that there is a nonclustered index on the dateadded column,
>>and the optimizer's rowcount estimate for the date condition is incorrect,
>>which leads to a suboptimal query plan.
>>
>>Your query without TOP asks for all rows where dateadded>'2010-12-23'
>>Your query with TOP 1 or TOP 2 asks for any one or two rows from the table
>>where dateadded>'2010-12-23'
>>
>>The TOP * query may be using the index on dateadd, but the TOP 1 query may
>>simply be searching the table or an index containing d as a column other
>>than the first, assuming enough rows meet the date condition that one will
>>be found quickly. If the average row size in the table is large, using
>>the index for SELECT * may be practical, even if it's necessary to look up
>>many rows in the main table.
>>
>>You should be able to see the problem by viewing the estimated execution
>>plans. The slow TOP 1 plan will not be using the index on dateadded, but
>>the TOP * plan will, perhaps - you could also see where the estimated
>>rowcount is wrong (it should be 1 or very close to 1 if the query returns
>>no rows).
>>
>>I'm assuming the row count is wrong because I don't think this particular
>>scenario will occur when the estimated row count is 0 or 1. Statistics
>>can lead to incorrect rowcount estimates if they are based on too small a
>>sample or are out-of-date, so to resolve the problem, if you need to, you
>>could try updating the statistics, or specifying that a larger fraction of
>>rows be used for the statistics involving the dateadded column.
>>
>>Steve Kass
>>Drew University
>>
>>
>>Troy Murphy wrote:
>>
>>
>>
>>>Why does this happen:
>>>
>>>I have a table with about 4 million records.
>>>Ex: accounts = Name, Account, DateAdded,...
>>>If I run a query with no results
>>> select * from accounts where dateadded>'2010-12-23'
>>>I get an empty result set immediately.
>>>
>>>However,
>>>If I run the query as:
>>> select top 1 * from accounts where dateadded>'2010-12-23'
>>>the query runs for a very long time.
>>>
>>>The same for top 2, but top 3 or greater returns the empty result set
>>>immediately.
>>>
>>>What gives?
>>>
>>>Troy
>>>
>>>
>>>
>>>
>
>
>
>



Relevant Pages

  • Re: Poor plan choice
    ... but while that would help with the pared down query that I came up ... This should prevent bookmark lookups for these large ... > Apparently SQL-Server estimates that the parallel plan will be faster. ...
    (microsoft.public.sqlserver.server)
  • Re: Query optimizer issue
    ... Consider a stored proc defined as follows: ... when compiling a plan. ... parameter @p1 into the query at compile time before a plan for the query ... execution plans, but a key requirement for everything to work as expected ...
    (microsoft.public.sqlserver.server)
  • Re: Execute SQL UD Function call timeout, 1-3 seconds in query ana
    ... We did build a little ASP.NET page yesterday that lets us put a given sql ... query into a text box and then return the execution plan (using Showplan_Text ... Query Analyzer. ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: Store procedure vs Direct statement ???
    ... Try opening up query analyzer and checking out the query plan or ... advantages of using stored procedures over not using stored procedures is the ... the better the overall performance of your SQL ... dynamic sql versus the execution plan for a stored procedure. ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: SP runs slower than same query in Query analyzer. Why??
    ... Andrew your reply was useful but I'm quite new to sql server and I'm ... query gives me the results in 3-4 seconds. ... > Keep in mind that the server has to compile a complete execution plan ... > when compiling a plan. ...
    (microsoft.public.sqlserver.clients)