Re: What's the deal with PAGEIOLATCH_SH? - PAGEIOLATCHIssue20040426.zip (0/1)

From: James Bradley (jbradley_at_isa-og.com)
Date: 04/27/04


Date: Mon, 26 Apr 2004 18:10:35 -0600

Hi Greg

That table has no primary key and no clustered index. This table is
currently being generated by joining other tables, but it will soon be
detached. I do have one field that is a good candidate for a PK -
shown as Idx2. I know it is a good practice to have a PK, but could
that affect the performance?

This table used to have very poorly chosen clustered index, which I
have since removed. If I were to cluster it, it would probably be by
Idx2 as well, but I don't see any real benefit for that. I don't have
any ranges on that ID anyway.

Thanks!

Jami

On Tue, 27 Apr 2004 08:54:15 +1000, "Greg Linwood"
<g_linwoodQhotmail.com> wrote:

>Hi James.
>
>I can't see any Primary Keys on those tables - do they not have PKs or have
>you just missed this in the script? This is very important. It's also
>important to know if any PK / indexes are clustered indexes or
>non-clustered.
>
>Your isolation of 500k rows and execution of the update taking 1 second is
>also important - it suggests to me that there's something seriously wrong
>with the original query plan. SQL Server is certainly capable of handling
>large updates like this, but I'd suggest something fundamental (such as PK
>being missing) is messing up the main query.
>
>Use of a clustered index might also help. If the table has a clustered index
>on the correct column, the query optimizer may be able to isolate the 1.5m
>rows updated by this query using a range scan and treat them sort of as if
>they're in their own table. Getting the choice of the clustered PK can be
>made manually or with the Index Tuning Wizard, which is a smart tool that
>understands the SQL Query Optimizer's internal workings.
>
>To use the ITW, you put the update statement in the Query Analyser,
>highlight it, then select the Query / Index Tuning Wizard and follow the
>prompts. Try it with / without the "Keep all existing indexes" option
>ticked. This tool is very powerful for making index suggestions, but also be
>aware that when you use it like this, it's only tuning for the individual
>query you highlighted in QA & might make suggestions that are good for that
>query but not for the overall work done by the db. It can also be used with
>workloads captured by SQL Profiler for better overall index suggestions, but
>you'll probably get joy for now from just tuning for this query.
>
>There is a substantial difference between having a clustered index on a
>table or not - it affects how ALL indexes on the table behave during major
>update statements like yours, so I suggest you spend some time reading up on
>clustered indexes & their relationship to non-clustered indexes as this will
>help you to understand what's going on under the hood. There's lots of
>information in SQL Books Online & in SQL Server books on this important
>topic.
>
>Not having Primary Key is such a major mistake that I'm assuming you've just
>missed this from your script - could you clarify that please?
>
>It would also be handy if you posted the actual query execution plan rather
>than the general statistics. Those stats aren't much use. The Query Plan
>gives us absolute information about precisely HOW the query is physically
>running & will likely lead to solving your problem. Do post this please -
>either in graphical or text format won't matter.
>
>Regards,
>Greg Linwood
>SQL Server MVP
>
>"James Bradley" <jbradley@isa-og.com> wrote in message
>news:gkaq80lj2914utqu0s9ts7nf220jau4nvf@4ax.com...
>> Hi Greg
>>
>> It took me a while, but I put together the create table/index calls
>> for these tables. I had to rename the fields because of
>> confidentiality issues.
>>
>> I've also attached the output from
>> dbcc show_statistics for the relevant indices.
>>
>> I ran the full update statement, it took 2 hours, 24 minutes. I've
>> attached the text output from the statistics for that run.
>>
>> I tried all 3 queries (the current one and the two you suggested
>> below), and the query execution plans are identical!
>>
>> I don't need to do this atomically - I have exclusive use of the
>> database when I need to run this. I could easily break it into
>> several batches.
>>
>> One thing that is still bothering me -- let's say SQL is choosing a
>> bad execution plan (hash join, etc.). Wouldn't I see the server busy
>> doing this, even if it is inefficient? Right now, the server is not
>> limited by either CPU or Disk I/O, and I would assume one of those
>> must be limiting, unless locks/latches are blocking it.
>>
>> Thanks for all the help!
>>
>> On Mon, 26 Apr 2004 11:07:06 +1000, "Greg Linwood"
>> <g_linwoodQhotmail.com> wrote:
>>
>> >Hi James
>> >
>> >Can you please send the full DDL for these two tables, including create
>> >table, indexes, keys, relations & DBCC SHOW_STATISTICS for these tables /
>> >indexes? I'm sceptible about it's choice of a hash join to perform this
>> >work.
>> >
>> >Another question for you - does this work need to be performed atomically
>as
>> >a single unit of work or is it possible to break it into two or three
>> >batches? This may increase performance tremendously.
>> >
>> >I'd suggest that the query written differently might yield a better plan,
>> >eg:
>> >
>> >UPDATE Table1
>> >SET hasChildren=1
>> >WHERE ID in (SELECT parentID FROM Table2)
>> >
>> >or
>> >
>> >UPDATE Table1
>> >SET hasChildren=1
>> >WHERE exists (SELECT * FROM Table2 where parentid = Table1.id)
>> >
>> >I'd suggest that batching updates might help even further if either of
>those
>> >re-written queries did yield better plans.
>> >
>> >Regards,
>> >Greg Linwood
>> >SQL Server MVP
>>
>



Relevant Pages

  • Re: table design for massive GPS data
    ... If the table is indexed well, then query performance will depend on the ... If the typical query is all the records for one vehicle on one calendar day ... If data arrives in real time, then a clustered index of date, vehicle, time, ... Your primary key is of course unique and doesn't allow NULL and by ...
    (comp.databases.ms-sqlserver)
  • Re: Move table without a clustered primary key?
    ... Now, SQL Server do not noccesarily need to move any of the actual data, ... but you *have the option* to move the data to whatever filegroup you specify in the MOVE TO option. ... Is this only relevant when the table is not going to have a clustered index, ... whether or not the clustered index supports a Primary Key. ...
    (microsoft.public.sqlserver.server)
  • Re: Clustered Index question
    ... Columnist, SQL Server Professional ... Without having primary key in the table, will it slow down the query that I ... the clustered index on PACKET_TIME, and instead have a clustered index on ...
    (microsoft.public.sqlserver.programming)
  • Re: Clustered Index question
    ... Columnist, SQL Server Professional ... If I keep the primary key, ... >select packet_time,packet_data from packet where ... >have the clustered index (either on PACKET_TIME or on ...
    (microsoft.public.sqlserver.programming)
  • Re: Speed question
    ... 99% of the time the query above is the one who will be run. ... I do not need this primary key for anything else, I will not query for it. ... have the clustered index (either on PACKET_TIME or on PACKET_CONTRACT, ... > memory the first time and the second time the pages are still in memory. ...
    (microsoft.public.sqlserver.programming)