Re: Poor plan choice

From: Gert-Jan Strik (sorry_at_toomuchspamalready.nl)
Date: 11/19/04


Date: Fri, 19 Nov 2004 19:48:51 +0100

Thomas, see inline

"Thomas R. Hummel" wrote:
>
> Hello,
>
> A couple days ago one of our queries suddenly started to perform
> abyssmally. The query is pretty straightforward - it joins several
> tables all on foreign keys and includes a GROUP BY with COUNT(*). I
> looked over the query plan and it looked a little odd so I tried
> cutting the query down to see where the issue might be. I eventually
> came up with the following:
>
> SELECT COUNT(*)
> FROM Table1 T1
> INNER JOIN Table2 T2 ON T2.table2_id = T1.table2_id
> WHERE T1.my_date = '2004-11-18'
>
> The table and column names have been changed to protect the innocent,
> but that is the exact format of the tables. Table1 has about 35M
> records. Table2 has about 6.5M records. For the date in question,
> Table1 has about 165K records.
>
> There is a non-clustered index on T1.my_date and there is a clustered
> index on T2.table2_id.

Consider adding a nonclustered index on T1(my_date,table2_id). This will
prevent the (quite expensive) bookmark lookups.

> The query plan for this simple query does an index seek on T1.my_date
> as I expected then it does a bookmark lookup (presumably because it
> needs T1.table2_id). It then includes parallelism, a hash, and then a
> bitmap creation. Meanwhile, it does an index scan using an index on
> Table2 that includes a single column that isn't even mentioned in the
> query(?!?!). It then uses parallelism and does a hash match/inner
> join.

Apparently SQL-Server estimates that the parallel plan will be faster.
If you expect differently, then you could add the hint OPTION (MAXDOP 1)
to force the serial plan.

Since the index on T2(table2_id) is clustered it is very wide at the
page level. In this case, SQL-Server estimates that it is faster to scan
a nonclustered index of table T2 (which also includes the clustered
index key) than it is to seek (or partially scan) the clustered index
for the estimated rows of the query.

Hope this helps,
Gert-Jan

> I've done UPDATE STATISTICS using WITH FULLSCAN for both tables and
> I've done a DBCC CHECKTABLE on both tables. Neither had any effect. I
> also tried to force the query to use the clustered index for Table2.
> For the simple query above it doesn't seem to help performance as the
> clustered index scan has a very large cost to it (I'm not sure that I
> entirely understand why). In the original query it helps substantially
> though. Instead of joining the 6.5M records to a lookup table first it
> joins it to Table1 first, which cuts down the number of records to the
> 165K before going about with other joins.
>
> What I'm looking for is any advice on other things that I can look at
> or any ideas on why SQL Server might be making these kinds of choices.
> I would have thought that the simple query above would have performed
> much better than it is currently (~30-35 seconds). I realize that
> there has to be a bookmark lookup, but I was still expecting a quick
> response from the server based on the indexes.
>
> Because of the table sizes, etc. I don't expect anyone to reproduce my
> results, so please don't ask me to provide DDL for all of the tables
> involved. If you have some ideas or even just guesses great, if not
> then that's ok too.
>
> Thanks,
> -Tom.



Relevant Pages

  • Re: IS NULL on field is not using index placed on that field
    ... > criteria in your WHERE clause the query engine needs to look through 2 ... > NULL" you get a nice quick plan using your nonclustered index too. ... > efficiently by just scanning the whole clustered index until it's got ... > even took out the null values and made the DeliverDate column NOT NULL ...
    (microsoft.public.sqlserver.programming)
  • Re: Whats the deal with PAGEIOLATCH_SH? - PAGEIOLATCHIssue20040426.zip (0/1)
    ... That table has no primary key and no clustered index. ... This table used to have very poorly chosen clustered index, ... SQL Server is certainly capable of handling ... >being missing) is messing up the main query. ...
    (microsoft.public.sqlserver.programming)
  • Re: Leftmost column in an index
    ... That's one example where the optimizer creates these _WA% statistics to ... which is the cost of a table scan. ... But this query: ... >> would likely change to a table scan or clustered index scan. ...
    (microsoft.public.sqlserver.programming)
  • Re: Index chosen dependant on result columns
    ... The result columns have a great deal to do with the optimal query ... with column A, the key of the clustered index of the table, A, to refer ... The cost to use the nonclustered index will include 10000 lookups in the ... instead chooses the plan that simply scans the entire clustered index, ...
    (microsoft.public.sqlserver.programming)
  • Re: Index chosen is wrong index
    ... constraints/indexes, do you have a clustered index (if so, what is its ... and the union can affect the query plan. ... > The execution plan for the query below selects the wrong ... > SDDRQJ, SDTRDJ, SDPDDJ, SDOPDJ, ...
    (microsoft.public.sqlserver.server)

Loading