Re: What's the deal with PAGEIOLATCH_SH?
From: James Bradley (jbradley_at_isa-og.com)
Date: 04/26/04
- Next message: Isaac Alexander: "Re: SQL Server Performance Monitoring"
- Previous message: Vishal Parkar: "Re: simple cursor question"
- In reply to: Greg Linwood: "Re: What's the deal with PAGEIOLATCH_SH?"
- Next in thread: Andrew J. Kelly: "Re: What's the deal with PAGEIOLATCH_SH?"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 26 Apr 2004 15:24:23 -0600
Ok, I think I might be on to something - I'm just not sure what :-)
Table1 is a fairly large table (5M rows, 20GB). I rebuilt this table, but only took the first 500K
rows, so it is 10% the size.
I tried running my update statement and it now takes 1 *second* to complete, updating 7226 rows!
That is over 7000 times faster, so the table size is definitely triggering the problem.
Does SQL Server have trouble when a single table is this size? The server has 2GB RAM and SQL
server is free to use as much memory as it wishes (currently 1.6GB).
Thanks,
Jami
On Mon, 26 Apr 2004 11:07:06 +1000, "Greg Linwood" <g_linwoodQhotmail.com> wrote:
>Hi James
>
>Can you please send the full DDL for these two tables, including create
>table, indexes, keys, relations & DBCC SHOW_STATISTICS for these tables /
>indexes? I'm sceptible about it's choice of a hash join to perform this
>work.
>
>Another question for you - does this work need to be performed atomically as
>a single unit of work or is it possible to break it into two or three
>batches? This may increase performance tremendously.
>
>I'd suggest that the query written differently might yield a better plan,
>eg:
>
>UPDATE Table1
>SET hasChildren=1
>WHERE ID in (SELECT parentID FROM Table2)
>
>or
>
>UPDATE Table1
>SET hasChildren=1
>WHERE exists (SELECT * FROM Table2 where parentid = Table1.id)
>
>I'd suggest that batching updates might help even further if either of those
>re-written queries did yield better plans.
>
>Regards,
>Greg Linwood
>SQL Server MVP
>
>"James Bradley" <jbradley@isa-og.com> wrote in message
>news:o03o801ne924b1kr7im1rkrv61sj3gn4i3@4ax.com...
>> I took a look at the disk queue length (all of this work is on a
>> single temp drive, not on the RAID), and it is happily running about
>> 0.7 for most of the query, with occasional spike to about 13 for a few
>> seconds. About 20 minutes into the query everything goes bad. The
>> disk I/O *drops* to 1MB/second in and out, and the disk queue length
>> jumps to 60-70 and stays there! CPU utilization stays low -- less
>> than 20% of one CPU.
>>
>> That sounds like a disk I/O problem to me, but basic test of reading
>> and writing the disk seem to work just fine (good queue lengths and
>> data rates). One thing I have noticed in PerfMon is when disk writes
>> go over some threshold, which seems to vary, the disk queue length
>> skyrockets. Sometimes I see this occur around 10MB/s write, just now
>> I the queue length going > 100 when the writes go over 4MB/s. Bizarre
>> behavior :-)
>>
>> I'll work on getting the real execution plan, instead of the estimated
>> one out of Query Analyzer, but I have never let the command finish (I
>> gave up after two hours). When I try a smaller table size (say 500K
>> rows instead of 4.6M), the query executes very quickly, and I don't
>> see these problems. This makes me think that maybe it is an in-memory
>> versus out-of-memory execution issue, except that the CPU utilization
>> is much too low.
>>
>> Here is the current statement I'm using:
>> UPDATE Table1
>> SET hasChildren=1
>> FROM
>> (
>> SELECT DISTINCT parentID FROM Table2
>> ) T
>> WHERE Table1.ID = Table2.parentID
>>
>> The tables are indexed (NC) on ID and parentID. There are no triggers
>> on Table1 and hasChildren is not reference by any index, clustered or
>> otherwise.
>>
>> I don't know how to post the estimated execution plan, but I can
>> describe it pretty easily.
>>
>> 1. Index scan on Table1 (31% of time - this is the big table)
>> 2. Index scan on Table2 (0% of time - this is a small lookup table)
>> 3. Hash Match/Right Semi Join (67% of time).
>>
>> #3 above shows an estimated row count of 888K, and a row size of 24.
>>
>> Total cost of the entire execution is 48.0 for 888K rows.
>>
>> Thanks for any tips!
>>
>> Jami
>>
>> On Sun, 25 Apr 2004 14:12:28 +1000, "Greg Linwood"
>> <g_linwoodQhotmail.com> wrote:
>>
>> >A fairly generally benchmark is up 3 to 4 queued requests per spindle are
>> >ok, but more than that on average represents the disk subsystem lagging.
>It
>> >seems that you're system's not experiencing this given this basic
>> >observation. I always like to take a quick "flashing lights" or audible
>> >noises from the disk box to make sure that the counters are working
>properly
>> >though (:
>> >
>> >SQL Server performs much of it's work in cache, so the absence of disk
>i/o
>> >doesn't equate to an absence of work - the next thing I'd suggest you
>check
>> >is the actual execution plans for those queries. To do this, use the
>> >profiler if your queries are being submitted by an application or select
>the
>> >Query / Show Execution Plan in Query Analyser if you're submitting the
>> >queries from there. Look for the most costly part of the query & look for
>> >scans or inefficient joins such as hash joins. It may well be that your
>> >query is executing completely in memory, but running on an inefficient
>plan?
>> >The observations you've made on the pageiolatch_sh waitstate may be
>> >incidental to the overall query execution plan being inefficient. If
>you're
>> >not sure on analysing execution plans, perhaps post the output from the
>> >query with "set statistics profile on" & we might be able to help there..
>> >
>> >Regards,
>> >Greg Linwood
>> >SQL Server MVP
>> >
>> >"James Bradley" <jbradley@isa-og.com> wrote in message
>> >news:krbm805uq3f2mkelo8adpn6ti9iuvbh3er@4ax.com...
>> >> Thanks Greg.
>> >>
>> >> DB ID 9 is my target database (not the source), and the disk I/O does
>> >> seem to behave unusually. When I watch the bytes in/out I see it
>> >> cranking along at 20-60MB/sec normally for about 20 minutes, then it
>> >> just hits a brick wall and drops to 1MB/sec. I'll check the queue
>> >> lengths - if they are small (< 1), then I presume it is SQL. If they
>> >> are over 1, then it is waiting for the disk system. Correct?
>> >>
>> >> Jami
>> >>
>> >> On Sun, 25 Apr 2004 11:39:00 +1000, "Greg Linwood"
>> >> <g_linwoodQhotmail.com> wrote:
>> >>
>> >> >I noticed in your original post that you asked how to identify if the
>> >> >contention was in tempdb & I didn't answer that. I'm not sure where
>that
>> >> >resource id came from: 9:1:1072396 but that's likely db#9,
>filegroup#1,
>> >> >object#1072396. You cacn translate the db# in master..sysdatabases,
>then
>> >the
>> >> >objectid from whatever database's sysobjects.
>> >> >
>> >> >Given you don't have any blocking, I'd suggest that you use Perfmon to
>> >check
>> >> >the physical disk object's read / write queue lengths to see if
>they're
>> >> >high. The reason I'm saying this is that you've said you're not
>getting
>> >the
>> >> >throughput you're expecting. If SQL Server is queuing more requests
>than
>> >the
>> >> >disk subsystem can keep up with, the Avg. Disk Read / Write Queue
>Length
>> >> >counters will identify this. If that's your problem, then you likely
>have
>> >> >only one of two problems - check the query plan to make sure that SQL
>> >Server
>> >> >isn't creating excessive i/o requests by reading more data than it
>neads
>> >(eg
>> >> >table scans or other inefficient query plans) or you might have to
>tune
>> >the
>> >> >disk sub-system.
>> >> >
>> >> >What "high" means for these counters depends on what disk
>configuration
>> >you
>> >> >have, but basically, if more than a few requests per physical disk
>> >spindle
>> >> >are queued on average then your disk system isn't keeping up or sql
>> >server's
>> >> >making too many requests..
>> >> >
>> >> >Regards,
>> >> >Greg Linwood
>> >> >SQL Server MVP
>> >> >
>> >> >"James Bradley" <jbradley@isa-og.com> wrote in message
>> >> >news:rl1l8091v3pg99t0iirj3h3stamvhsfroq@4ax.com...
>> >> >> From what I understand, a latch is a lightweight lock, which seems
>> >> >> fine to me. I checked the spids from Enterprise Manager and there
>is
>> >> >> no blocking/blocked by listed. When I looked, there were only two
>> >> >> connections to the databases - one by Query Analyzer when it was
>> >> >> running my command, and the second by Enterprise Manager when I was
>> >> >> looking at what was happening.
>> >> >>
>> >> >> Could the PAGIOLATCH_SH be waiting for data to come in from the
>disk?
>> >> >> I was monitoring total disk I/O and it was only 1MB/sec each way,
>not
>> >> >> like the 20-60MB/sec that I was expecting.
>> >> >>
>> >> >> I really need to figure out what is causing these delays, because I
>> >> >> can't wait hours for a statement that should take 20 minutes :-(
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Jami
>> >> >>
>> >> >> On Sat, 24 Apr 2004 09:30:17 +1000, "Greg Linwood"
>> >> >> <g_linwoodQhotmail.com> wrote:
>> >> >>
>> >> >> >Hi James
>> >> >> >
>> >> >> >A PAGEIOLATCH_SH simply indicates that the connection your
>statement
>> >is
>> >> >> >running on is waiting on acquiring a shared lock on a storage page.
>
>> >There
>> >> >> >are a variety of reasons why this could be occurring, but the most
>> >common
>> >> >is
>> >> >> >blocking. Have you checked sp_lock to see if the spid is blocked by
>> >> >another
>> >> >> >spid? You can also easily see if the spid is actually doing work by
>> >> >checking
>> >> >> >the CPU activity at the same time.
>> >> >> >
>> >> >> >Regards,
>> >> >> >Greg Linwood
>> >> >> >SQL Server MVP
>> >> >> >
>> >> >> >"James Bradley" <jbradley@isa-og.com> wrote in message
>> >> >> >news:ti3j801v1otant21ph1f5l8pq3t4apm4bv@4ax.com...
>> >> >> >> I'm seeing some strange performance problems with my SQL server
>(8.0
>> >> >> >> SP3). I have some large data conversion work to do and sometimes
>> >the
>> >> >> >> server runs pretty quickly, and sometimes is simply appears to go
>> >idle
>> >> >> >> in the middle of a long statement!
>> >> >> >>
>> >> >> >> Whenever I check the current activity on the process, I'm seeing
>> >that
>> >> >> >> it is waiting (usually 50-400 mSec) for a PAGEIOLATCH_SH. The
>> >system
>> >> >> >> has no load (1-5% CPU, 1MB/sec disk I/O). I am using two
>databases,
>> >> >> >> and I am the only user on those two. There are 3-5 other users
>on a
>> >> >> >> different database, usually idle, and we are running reporting
>> >> >> >> services.
>> >> >> >>
>> >> >> >> What is the server waiting for? Is there some conflict with
>tempdb?
>> >> >> >> Is there some command to figure out what the resource is that it
>is
>> >> >> >> wairing for (e.g., 9:1:1072396)?
>> >> >> >>
>> >> >> >> Thanks!
>> >> >> >>
>> >> >> >> Jami
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>
- Next message: Isaac Alexander: "Re: SQL Server Performance Monitoring"
- Previous message: Vishal Parkar: "Re: simple cursor question"
- In reply to: Greg Linwood: "Re: What's the deal with PAGEIOLATCH_SH?"
- Next in thread: Andrew J. Kelly: "Re: What's the deal with PAGEIOLATCH_SH?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|