Re: Covering indexes versus column order in Delaney
From: Hugo Kornelis (hugo_at_pe_NO_rFact.in_SPAM_fo)
Date: 11/05/04
- Next message: Steve: "Re: Error--Update Staement"
- Previous message: tshad: "Insert with response"
- In reply to: DW: "Re: Covering indexes versus column order in Delaney"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 05 Nov 2004 23:25:35 +0100
On Fri, 05 Nov 2004 09:25:12 -0800, DW wrote:
>> The assumption here is that this index is NOT a covering index, and
>> that there are other columns in the query that are not part of the
>> index.
>>
>
>Yes, but that index will at least tell SQL which data pages it needs to
>fetch, right? It doesn't need to scan the whole table, at least, and it's
>better to have this noncovering index than not to have it, I hope! The
>comment that this index is "not useful" is still a bit confusing. If it's
>not useful, then would the performance be the same if the index were
>dropped?
Hi DW,
Basically: yes - if the index is not useful to ANY query or statement,
your data retrieval speed won't be hurt by dropping it and your data
modification speed will actually increase. If it's useful in some queries
and not useful in others, it simply won't be used in the latter group.
To understand why a non-covering index is not useful in many cases, let's
look at a simple example. Let's suppose you have a table with 100,000
rows. Let's further assume that the row length in 400 bytes, so there is
room for 20 rows in a 8K data page. If each page if 50% full, the complete
table will take 10,000 pages. Therefore, the cost of a table scan (or
rather: a clustered index scan) is 10,000 page reads (plus some overhead
from the non-leaf levels of the index, but that's insignificant).
Let's further assume you have a nonclustered index that needs 20 bytes per
index entry (index value plus row locator, which is the corresponding
clustered index value). There's room for 400 index entries in an index
page, so the index will occupy only 250 pages. The starting cost of an
index scan is therefore 250 pages (plus overhead for the non-leaf levels,
probably 1 page in this case - again, insignificant).
For a covering index, the starting cost is also the total cost: you see
how many page reads are saved by using this index! For a non-covering
index, the complete rows have to be found as well. This is called a
bookmark lookup and it involves traversing the B-tree of the clustered
index to the leaf level, where the data lives. For most tables, the
clustered index' B-tree is three or four levels deep; let's assume our
example table needs just 3. This means a total of 3 page reads for each
bookmark lookup. Since the rows are processed in order of the nonclustered
index, they won't be in order for the clustered index - chances are you'll
have to read the same data page several times. The non-leaf levels of the
clustered index will probably stay in cache; the 10,000 data pages won't
and will require another physical read.
Before we can calculate the total cost of the query using a noncovering
index, we need to know the selectivity of the WHERE clause: how many rows
can be disregarded based on comparing the values that are in the index to
the WHERE clause?
If only 1% of the rows needs to be fetched, the total cost of the query
will be 250 pages (noncovering index) + 1% * 100,000 * 3 pages (bookmark
lookup), for a total of 3,250 pages. 2,000 page reads can assumed to come
from cache, 1,250 physical reads remain. This is still significant less
than the table scan, so the index will be used if the selectivity is
expected to be around 1%.
If 35% of the rows match, the total cost increases to 250 pages (for the
index) + 35% * 100,000 * 3 pages (bookmark lookup) for a total of 105,250
pages, 70,000 logical and 35,250 physical. In this case, it will be much
cheaper to simply scan the table and disregard the noncovering index. It
is "not useful" for this query.
>My situation is that I have a transaction table with a composite key
>consisting of account number, transaction date, and transaction sequence
>number, and I often need to look at all of last month's transactions (and
>do further sub-selects within those). So my question really is, since I
>have that composite key, and it includes an index on transaction date, will
>it help for me to add an additional, separate index on the transaction
>date?
First thing I'd look into is changing the order of columns on the existing
composite key. Finding last month's transactions will benefit greatly if
the composite key were defined as transactiondate, account number,
sequence number (or transaction date, sequence number, account number),
especially if this would also be a clustered index. Of course, other
queries that search for transactions by account number will suffer from
this change; you'll have to test it, then balance the pros and cons.
Another option would be to leave the current composite index in the order
that it has, but make it nonclustered and create a new, clustered
nonunique index on transaction date. Again, this will also influence all
other queries and all data modification statements, so you'll have to test
before you decide.
Adding a nonclustered index on transaction date will probably not help you
(at least not with queries that need last month's transactions): filtering
on month is probably not selective enough to push the cost of using this
index plus a bookmark lookup below the cost of scanning the whole table.
Best, Hugo
-- (Remove _NO_ and _SPAM_ to get my e-mail address)
- Next message: Steve: "Re: Error--Update Staement"
- Previous message: tshad: "Insert with response"
- In reply to: DW: "Re: Covering indexes versus column order in Delaney"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|