Re: Indexes Confuses me!!

From: Itzik Ben-Gan (itzik_at_REMOVETHIS.SolidQualityLearning.com)
Date: 09/27/04


Date: Mon, 27 Sep 2004 18:22:27 -0400

Sure. Kalen Delaney has a column called Inside SQL Server in SQLMag
(www.sqlmag.com) where she often covers indexing issues. You may want to
start with the first issue dating back to March 1999.

-- 
BG, SQL Server MVP
www.SolidQualityLearning.com
"Drew" <agarwalp@lucent.com> wrote in message 
news:ejuoGiFpEHA.3464@tk2msftngp13.phx.gbl...
> Thanks Ben-Gan for your patience.
> Yes you are right in saying that i need a bit of reading. Do you know 
> about
> some online articles or notes relating to this?
> Also one more question
>> heap: 1 x n
>> clustered: L x n
> Should i infer that if you are using (Select * from table1) the heap is
> better than the clustered index and clustered index is more useful in
> situations like (Select * from table where col1='Drew') where clustered
> index is on col1.
>
> Thanks Again
>
>
> "Itzik Ben-Gan" <itzik@REMOVETHIS.SolidQualityLearning.com> wrote in 
> message
> news:u#quFvxoEHA.896@TK2MSFTNGP12.phx.gbl...
>> As David mentioned, Indexing is a big topic...
>>
>> > 1,002,500= 1,000,000  rows + 2500 index pages. Exactly whats happening
>> > here?
>>
>> A couple of things that might help you understand the calculations:
>>
>> The minimum I/O unit used by SQL Server is a page. So, if SQL Server 
>> needs
>> to read a certain row, it has to read the whole page first.
>> If you're using an index for filtering or sorting purposes, and the index
>> does not cover the query (meaning that the index does not include all
>> attributes referred to by the query), SQL Server grabs the row locator
> from
>> the index row, and performs a bookmark lookup. Bookmark lookup means
>> following the row locator to read the data row.
>>
>> Now, the row locator has a different format depending on whether the 
>> table
>> is clustered or not. If the table doesn't have a clustered index (aka
> heap),
>> the row locator is a physical pointer consisting File#:Page#:Row#. With
> such
>> a pointer, reading the actual data row costs a single logical page read.
>> If the table is clustered, the row locator is more logical; it contains
> the
>> clustered index key of the same row. So, a bookmark lookup in a clustered
>> table cannot locate the row directly with a single page read, rather, it
> has
>> to issue a whole seek operation within the clustered index. If the
> clustered
>> index has 3 levels, each bookmark lookup translates to 3 logical reads.
>>
>> So depending on whether the table is a heap or clustered (with L levels),
>> the total number of logical reads required to perform bookmark lookups 
>> for
> n
>> keys is:
>>
>> heap: 1 x n
>> clustered: L x n
>>
>> > Q1. >>clustered index has three levels. What does this mean.
>> > Q2. How does 3,002,500 comes out?
>>
>> 3 was just an example. An index might have a different number of levels
> than
>> 3. You really need to get familiar with Indexes and balanced trees to
> figure
>> this out. Recommended reading on the subject: Kalen's Inside SQL Server
> 2000
>> and Donald E. Knuth's 3rd volume of The Art of Computer Programming -
>> Sorting & Searching.
>> You can calculate the number of levels in an index if you know the number
> of
>> rows in the table (r), and the index row size (i), fragmentation aside 
>> for
>> now.
>> First calculate the number of index rows that fit in an index page, p =
> r/i.
>> So p is the number of rows that can reside in the root page of the tree
>> (level = 0). Since each index row points to a whole page in the next 
>> level
>> of the balanced tree, the number of index rows in level l of the index is
>> p^(l+1).
>> The leaf level of the index contains a key for each row from the base
> table,
>> so you can calculate the number of levels in the index:
>>
>> ceiling(log(i)r) = L
>>
>> If you also want to take index fragmentation into consideration, you need
> to
>> divide r by the average page density (f):
>>
>> ceiling(log(i)(r/f)) = L
>>
>> With fairly short index row sizes (~20 bytes), you'll end up with 3 or 4
>> index levels for any given table size in the range of millions to 
>> billions
>> of rows.
>> That's why I used 3 in my example.
>>
>> > Does this mean the number of data reads depends on number of data 
>> > pages?
>> > So
>> > if i have a Datapage with 10 rows and a Datapage with 100 rows, the
>> > efficiency is same.
>>
>> A couple of things...
>> Try to distinguish between just scanning data where you're after all rows
> of
>> a page and reading a row using a bookmark lookup. The former will read
> less
>> pages if more rows fit in a page, while the latter won't since a page 
>> will
>> be read for each target row.
>> Things get even more complicated when you consider the fact that a 
>> logical
>> read (a read from cache) is much more costly than a physical read (a read
>> from disk).
>> So, to answer your question, it's not the same when you're scanning data;
> it
>> is the same in terms of logical reads when you're using lookups, but 
>> since
>> you're reading less pages physically in total, there will be some
> difference
>> in this case as well.
>>
>> --
>> BG, SQL Server MVP
>> www.SolidQualityLearning.com
>>
>>
>> "Drew" <agarwalp@lucent.com> wrote in message
>> news:%235upYAwoEHA.3712@TK2MSFTNGP15.phx.gbl...
>> >>>Then, performing a bookmark lookup using relative identifiers (RIDs)
>> >>>costs
>> >>>one page logical read for each key, or 1 x 1,000,000. This plan costs 
>> >>>a
>> >>>total of 1,002,500, logical reads.
>> > 1,002,500= 1,000,000  rows + 2500 index pages. Exactly whats happening
>> > here?
>> >
>> >>>If the table has a clustered index on a column other than col1 and 
>> >>>that
>> >>>clustered index has three levels, scanning the leaf level of the
>> >>>nonclustered index in an ordered fashion costs 2,500 logical reads.
> Then,
>> >>>performing a bookmark lookup using clustering keys (i.e., a seek in 
>> >>>the
>> >>>clustered index) costs three page logical reads per key, or 3 x
>> >>>1,000,000.
>> >>>This plan costs a total of 3,002,500 logical reads.
>> >
>> > Q1. >>clustered index has three levels. What does this mean.
>> > Q2. How does 3,002,500 comes out?
>> >
>> >>>However, the I/O cost of a plan that doesn't use the nonclustered 
>> >>>index
>> >>>is
>> >>>50,000 logical reads for a table scan, plus the cost of sorting the
>> >>>data-much more efficient than plans that use the nonclustered index"
>> >
>> > Does this mean the number of data reads depends on number of data 
>> > pages?
>> > So
>> > if i have a Datapage with 10 rows and a Datapage with 100 rows, the
>> > efficiency is same.
>> >
>> > Thanks for your patience.
>> >
>> >
>> > "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in 
>> > message
>> > news:ujYnwxvoEHA.1300@TK2MSFTNGP12.phx.gbl...
>> >> In a nutshell, SQL Server's optimizer has to weigh-up the cost of
> reading
>> >> the index pages against the cost of scanning the table (or clustered
>> > index).
>> >> Using the index isn't always the fastest way to retrieve the data.
> Which
>> >> point of Itzik's article didn't you follow? If you want to understand
>> > index
>> >> optimizations you should study a good book as it is a complex topic.
>> > "Inside
>> >> SQL Server" (Delaney) goes into this in depth.
>> >>
>> >> --
>> >> David Portas
>> >> SQL Server MVP
>> >> --
>> >>
>> >>
>> >
>> >
>>
>>
>
> 


Relevant Pages

  • Re: Indexes Confuses me!!
    ... The minimum I/O unit used by SQL Server is a page. ... and performs a bookmark lookup. ... reading the actual data row costs a single logical page read. ... >>>performing a bookmark lookup using clustering keys (i.e., ...
    (microsoft.public.sqlserver.programming)
  • Re: Indexes Confuses me!!
    ... since in both cases SQL Server will probably scan the data by ... and performs a bookmark lookup. ... reading the actual data row costs a single logical page read. ... >>>>>performing a bookmark lookup using clustering keys (i.e., ...
    (microsoft.public.sqlserver.programming)
  • Re: Quorum Disk or Majority Node?
    ... SQL Server does not NATIVELY support MNS clustering. ... >> The chapter "Three-Site Majority Node Set Quorum in Geographically ...
    (microsoft.public.sqlserver.clustering)
  • RE: Clustering Set up
    ... Both SQL Server 2000 Failover Clustering and SQL Server 2000 Log Shipping are disaster recovery options. ...
    (microsoft.public.sqlserver.clustering)
  • Re: Using sql server failover clustering
    ... This says that NODE1 has a local disk and NODE2 has a local disk. ... For a clustered instance of SQL Server, ... I convince my boss to work on sql server failover clustering for better ... My database size is more then 20Gb and in some cases more then 100Gb. ...
    (microsoft.public.sqlserver.clustering)

Loading