Re: IS NULL on field is not using index placed on that field

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Mike Hodgson (mike.hodgson_at_mallesons.nospam.com)
Date: 02/23/05


Date: Wed, 23 Feb 2005 23:55:43 +1100

Upon reflection, most of this sounds like crap to me now (not sure what
I was on at the time). As such, disregard most of it. (Not having a
good day.)

--
*mike hodgson* |/ database administrator/ | mallesons stephen jaques
*T* +61 (2) 9296 3668 |* F* +61 (2) 9296 3885 |* M* +61 (408) 675 907
*E* mailto:mike.hodgson@mallesons.nospam.com |* W* http://www.mallesons.com
Mike Hodgson wrote:
> Indexes (clustered and nonclustered) include nulls (they're just 
> another value, albeit a special one).  But, when you have the 2 
> criteria in your WHERE clause the query engine needs to look through 2 
> separate ranges of values in the index (the null values and the 
> '2004-2-3' values)...sort of.  So that's 2 seek operations (plus the 
> associated lookups into the clustered index to get the rest of the 
> data you request in your column list).  So that's twice as much work 
> as if you just have the single condition in your WHERE clause 
> (DeliverDate = '2004-2-3').  If you just say "WHERE DeliverDate IS 
> NULL" you get a nice quick plan using your nonclustered index too.  
> When the query optimiser comes up with an execution plan (in your 
> case) it figures out pretty quickly that it can get the data much more 
> efficiently by just scanning the whole clustered index until it's got 
> all the data it needs, rather than flicking through the nonclustered 
> index (twice) and looking up the matching leaf nodes in the clustered 
> index as well.
>
> You'd have to test it out to be sure but I suspect the UNION ALL that 
> David suggested would not really be any more efficient just because on 
> the surface of it I'm guessing it would do pretty much the same 
> operations (just in 2 separate SELECT statements) as the single query 
> with the OR.  The OR operator can be nasty and should be avoided if 
> possible but sometimes it has it's place.
>
> The only way the nulls in the index affect the index selection process 
> is that they serve as yet another value to sort in the index.  At 
> least I think that's the only affect they have.  With the non-null 
> values, the optimiser can reorder the WHERE clauses so that it doesn't 
> have to keep reversing direction in the index to find the 
> noncontinuous rows; not so with the nulls it would seem.
>
> If you have a big enough, fragmented enough index it will get avoided 
> all together.  I repeated the exercise but included a 3rd column 
> (char(3000)), just to pad out the rows so I'd get lots of page splits 
> & index fragmentation, and inserted 3600 rows instead of the 10 rows I 
> had before (the 10 rows would have been on 1 page but the 3600 rows, 
> each 3000+ bytes, would be split over at least 1800 pages - 2 rows per 
> page).  Selectivity was still the same but the optimiser thought (in 
> all 3 cases (where date = x, where date = x or date is null, and where 
> date is null)) that it would be better to just do a table scan.  I 
> even took out the null values and made the DeliverDate column NOT NULL 
> and it made no difference.
>
> Sometimes you just have to trust that the optimiser knows best (which 
> it usually does).  Of course, this could all be crap (I'm getting 
> tired) but it sounds logical to me.
>
> --
> *mike hodgson* |/ database administrator/ | mallesons stephen jaques
> *T* +61 (2) 9296 3668 |* F* +61 (2) 9296 3885 |* M* +61 (408) 675 907
> *E* mailto:mike.hodgson@mallesons.nospam.com |* W* 
> http://www.mallesons.com
>
>
>
> Adrian Bezzina wrote:
>
>>Mike,
>>
>>Thanks for the prompt reply.
>>
>>after executing (i know the tables and fields have changed) the query on the 
>>following table:
>>
>>Consign (table)
>>    |
>>    |_ [UNIQUE] GUID Field - PrimaryKey.
>>    |
>>    |_ ORIGINAL_DATE DateTime Field - Clustered Index on this field alone.
>>    |
>>    |_ DELIVER_DATE DateTime Field - non clustered index on this field.
>>
>>set showplan_text on
>>go
>>
>>SELECT * FROM Consign WHERE Deliver_date = '2004-2-3'
>>SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS 
>>NULL
>>go
>>
>>set showplan_text off
>>go
>>
>>the plan for the first query is doing a 'Index Scan' on the non-clustered 
>>index on the deliver date field
>>
>>the plan for the second query is doing a 'Clustered Index Scan' on a 
>>clustered index  - original_date and i suppose this is the equivalent to a 
>>table scan as I have moved the clustered index off the primary key to a more 
>>ordered field (in this case the original_date field).
>>
>>I suppose what I am really asking is:
>>
>>Are null values placed in an Index ? and if so does that effect the qeury if 
>>IS NULL (or IS NOT NULL) affect the selection of which index to use (sorry 
>>about changing the table structure, but i am showing you the indexes in 
>>question).
>>
>>Regards,
>>Adrian.
>>
>>--------------------------
>>
>>"Mike Hodgson" <mike.hodgson@mallesons.nospam.com> wrote in message 
>>news:e$tD6MWGFHA.1932@TK2MSFTNGP14.phx.gbl...
>>What does it do?  A table/clustered index scan?  We don't really have enough 
>>info to answer - can you post the execution plans & the DDL (table schema 
>>and schema for all indexes on the table)?  To get the execution plans just 
>>turn on showplan_text before the select statements and off after the select 
>>statements.  That is,
>>
>>set showplan_text on
>>go
>>
>>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS 
>>NULL
>>go
>>
>>set showplan_text off
>>go
>>
>>When I tried it on a tmp table (I had only a nonclustered index on 
>>DeliverDate) it chose:
>>
>>a nonclustered index seek for:
>>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>>
>>a table scan for:
>>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS 
>>NULL
>>
>>a nonclustered index seek for:
>>SELECT * FROM Deliveries WHERE DeliverDate IS NULL
>>
>>I guess the optimiser decided that it was more efficient to scan the table 
>>once than to do 2 seek passes (+ associated lookups) in the DeliverDate 
>>nonclustered index for the query with the 2 where criteria.  When I added a 
>>clustered index on OrderID, the table scan operation turned into a scan on 
>>the DeliverDate nonclustered index because the index suddenly became a 
>>covering index (since all nonclustered indexes also contain the clustering 
>>key, thereby implicitly making the DeliverDate index really (DeliverDate, 
>>OrderID)).
>>
>>My quick tests only had 10 rows of data.  It would make a difference if I 
>>was running the SELECT queries against a bigger table.
>>
>>--
>>
>>mike hodgson | database administrator | mallesons stephen jaques
>>T +61 (2) 9296 3668 | F +61 (2) 9296 3885 | M +61 (408) 675 907
>>E mailto:mike.hodgson@mallesons.nospam.com | W http://www.mallesons.com
>>
>>
>>Adrian Bezzina wrote:
>>Hi All,
>>
>>I have the table structure as follows
>>
>>->Deliveries (table)
>>        |__OrderID (Guid)
>>        |__DeliverDate (DateTime) (ALLOW NULLS)
>>
>>Index is on DeliverDate (ASC)
>>
>>60000 records
>>
>>look at the query plan for the following and is using the proper index:
>>
>>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>>
>>If I now change the query to:
>>
>>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>>NULL
>>
>>The query plan does not choose the DeliverDate index anymore?
>>
>>What is the reason for this?
>>
>>Regards,
>>Adrian.
>>
>>
>> 
>>
>>
>>  
>>


Relevant Pages

  • Re: Index chosen dependant on result columns
    ... The result columns have a great deal to do with the optimal query ... with column A, the key of the clustered index of the table, A, to refer ... The cost to use the nonclustered index will include 10000 lookups in the ... instead chooses the plan that simply scans the entire clustered index, ...
    (microsoft.public.sqlserver.programming)
  • Re: IS NULL on field is not using index placed on that field
    ... Can you specify which index operation it is using on the clustered index. ... > But i am still intriugued to know why the first query (this table has now been built from ... >> best-estimate plan. ... >>>SELECT * FROM Deliveries WHERE DeliverDate IS NULL ...
    (microsoft.public.sqlserver.programming)
  • Re: IS NULL on field is not using index placed on that field
    ... DECLARE @deliverDate AS DATETIME ... > the plan for the first query is doing a 'Index Scan' on the non-clustered ... > When I tried it on a tmp table (I had only a nonclustered index on ...
    (microsoft.public.sqlserver.programming)
  • Re: Index chosen dependant on result columns
    ... the clustered index seek was so slow that the program would actually timeout ... > The result columns have a great deal to do with the optimal query ... Your nonclustered index on B,C is like a table that contains ... Suppose the optimizer ...
    (microsoft.public.sqlserver.programming)
  • Re: Clustered index vs. nonclustered index for GUID primary key
    ... The table, or the clustered index ... If we now add a nonclustered index on column D, ... It may, however, store the data from some columns ... >>the query could use a clustered index. ...
    (microsoft.public.sqlserver.programming)