Re: What's the deal with PAGEIOLATCH_SH? - PAGEIOLATCHIssue20040426.zip (0/1)
From: James Bradley (jbradley_at_isa-og.com)
Date: Mon, 26 Apr 2004 18:10:35 -0600
That table has no primary key and no clustered index. This table is
currently being generated by joining other tables, but it will soon be
detached. I do have one field that is a good candidate for a PK -
shown as Idx2. I know it is a good practice to have a PK, but could
that affect the performance?
This table used to have very poorly chosen clustered index, which I
have since removed. If I were to cluster it, it would probably be by
Idx2 as well, but I don't see any real benefit for that. I don't have
any ranges on that ID anyway.
On Tue, 27 Apr 2004 08:54:15 +1000, "Greg Linwood"
>I can't see any Primary Keys on those tables - do they not have PKs or have
>you just missed this in the script? This is very important. It's also
>important to know if any PK / indexes are clustered indexes or
>Your isolation of 500k rows and execution of the update taking 1 second is
>also important - it suggests to me that there's something seriously wrong
>with the original query plan. SQL Server is certainly capable of handling
>large updates like this, but I'd suggest something fundamental (such as PK
>being missing) is messing up the main query.
>Use of a clustered index might also help. If the table has a clustered index
>on the correct column, the query optimizer may be able to isolate the 1.5m
>rows updated by this query using a range scan and treat them sort of as if
>they're in their own table. Getting the choice of the clustered PK can be
>made manually or with the Index Tuning Wizard, which is a smart tool that
>understands the SQL Query Optimizer's internal workings.
>To use the ITW, you put the update statement in the Query Analyser,
>highlight it, then select the Query / Index Tuning Wizard and follow the
>prompts. Try it with / without the "Keep all existing indexes" option
>ticked. This tool is very powerful for making index suggestions, but also be
>aware that when you use it like this, it's only tuning for the individual
>query you highlighted in QA & might make suggestions that are good for that
>query but not for the overall work done by the db. It can also be used with
>workloads captured by SQL Profiler for better overall index suggestions, but
>you'll probably get joy for now from just tuning for this query.
>There is a substantial difference between having a clustered index on a
>table or not - it affects how ALL indexes on the table behave during major
>update statements like yours, so I suggest you spend some time reading up on
>clustered indexes & their relationship to non-clustered indexes as this will
>help you to understand what's going on under the hood. There's lots of
>information in SQL Books Online & in SQL Server books on this important
>Not having Primary Key is such a major mistake that I'm assuming you've just
>missed this from your script - could you clarify that please?
>It would also be handy if you posted the actual query execution plan rather
>than the general statistics. Those stats aren't much use. The Query Plan
>gives us absolute information about precisely HOW the query is physically
>running & will likely lead to solving your problem. Do post this please -
>either in graphical or text format won't matter.
>SQL Server MVP
>"James Bradley" <email@example.com> wrote in message
>> Hi Greg
>> It took me a while, but I put together the create table/index calls
>> for these tables. I had to rename the fields because of
>> confidentiality issues.
>> I've also attached the output from
>> dbcc show_statistics for the relevant indices.
>> I ran the full update statement, it took 2 hours, 24 minutes. I've
>> attached the text output from the statistics for that run.
>> I tried all 3 queries (the current one and the two you suggested
>> below), and the query execution plans are identical!
>> I don't need to do this atomically - I have exclusive use of the
>> database when I need to run this. I could easily break it into
>> several batches.
>> One thing that is still bothering me -- let's say SQL is choosing a
>> bad execution plan (hash join, etc.). Wouldn't I see the server busy
>> doing this, even if it is inefficient? Right now, the server is not
>> limited by either CPU or Disk I/O, and I would assume one of those
>> must be limiting, unless locks/latches are blocking it.
>> Thanks for all the help!
>> On Mon, 26 Apr 2004 11:07:06 +1000, "Greg Linwood"
>> <g_linwoodQhotmail.com> wrote:
>> >Hi James
>> >Can you please send the full DDL for these two tables, including create
>> >table, indexes, keys, relations & DBCC SHOW_STATISTICS for these tables /
>> >indexes? I'm sceptible about it's choice of a hash join to perform this
>> >Another question for you - does this work need to be performed atomically
>> >a single unit of work or is it possible to break it into two or three
>> >batches? This may increase performance tremendously.
>> >I'd suggest that the query written differently might yield a better plan,
>> >UPDATE Table1
>> >SET hasChildren=1
>> >WHERE ID in (SELECT parentID FROM Table2)
>> >UPDATE Table1
>> >SET hasChildren=1
>> >WHERE exists (SELECT * FROM Table2 where parentid = Table1.id)
>> >I'd suggest that batching updates might help even further if either of
>> >re-written queries did yield better plans.
>> >Greg Linwood
>> >SQL Server MVP