Re: Extremely high % Disk Time .. How is this possible??

From: Andrew J. Kelly (sqlmvpnooospam_at_shadhawk.com)
Date: 02/10/05


Date: Thu, 10 Feb 2005 10:18:06 -0500

Completely agree with Geoff.

-- 
Andrew J. Kelly  SQL MVP
"Geoff N. Hiten" <SRDBA@Careerbuilder.com> wrote in message 
news:%237$Gb13DFHA.2824@tk2msftngp13.phx.gbl...
> As Andrew pointed out, it is the physical drives that appear to be 
> limited.
> Slicing them into logical drives may look better but it does not increase 
> IO
> capability.  Separating data and logs onto different physical disks with 
> the
> correct RAID layouts will increase performance.  Andrew also suggested
> profiling for performance to find if there are just a few really ugly
> queries that are killing your system.  I agree completely.
>
> As for the original question on Percent IO Time, that counter has been
> largely irrelevent since SCSI Overlapped IO was introduced way back when.
> Caching controllers really killed it.  I prefer to look at disk queue 
> length
> and read bytes/write bytes per second as true measures of disk 
> performance.
>
> I believe that you will be running with write cache off in a non-SAN 
> cluster
> to prevent data loss during a cluster failover.  This is a big performance
> killer and one major reason I do not recommend a cluster without a SAN. 
> All
> indicators are that you need to modify your IO system.  You may find other
> performance bottlenecks as you remove the IO limit, but for right now, 
> that
> is the worst offender.
>
> -- 
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
>
> I support the Professional Association for SQL Server
> www.sqlpass.org
>
> "Tim" <tim@pds.notanaddress.ca> wrote in message
> news:%23kwePe3DFHA.2608@TK2MSFTNGP10.phx.gbl...
>> I do have 5 logical drives, 1 for logs and 1 for dbases for each server,
> and
>> a quorum drive.  Been evaluating the benefits of adding a dedicated array
>> for tempDB and indexes.  Roughly 30-40 % of ransactions occur in tempDB
>> (still have a lot of tempDbases as it seems to be the most efficient way
> to
>> maintain/handle the passing of tables between stored procedures.  In many
>> cases, there is no easy way to eliminate the passing of the tables as a
>> couple of stored procedures (the big offenders) require a large amount of
>> processing.
>>
>>
>> "Andrew J. Kelly" <sqlmvpnooospam@shadhawk.com> wrote in message
>> news:e65DCOtDFHA.2632@TK2MSFTNGP12.phx.gbl...
>> > It sure sounds like your storage array is not able to handle the load.
>> > Having a Raid 1+0 is great but only having 6 drives is on the low end.
>> > Since you are not on a SAN you only have the cache on the controllers
>> > themselves to help deal with all the writes.  Make sure the cache on 
>> > the
>> > controllers is set to 100% write back and not read.  It also sounds 
>> > like
>> you
>> > only have a single drive array and everything is on it (logs , data and
>> > tempdb).  That also is not good.  You should have at least a Raid1 or
>> > Raid1+0 for just the log files.  The spikes can be caused by lots of
>> things
>> > but you are probably seeing some of this due to checkpoints.  You can
> use
>> > the Checkpoint pages per second perfmon counter to correlate this to 
>> > the
>> > disk queue spikes.  You might also try using connection pooling instead
> of
>> > that many individual connections.  You will need multiple pools to
> handle
>> > the different dbs but it should cut down the actual number of
> connections
>> > quite a bit.Have you ran any traces to see what kind of queries are
> being
>> > run and who the worst offenders are?  You may have a few really poorly
>> tuned
>> > queries that are hurting everything else.
>> >
>> >
>> > -- 
>> > Andrew J. Kelly  SQL MVP
>> >
>> >
>> > "Tim" <tim@pds.notanaddress.ca> wrote in message
>> > news:%23v7AhlsDFHA.2804@TK2MSFTNGP14.phx.gbl...
>> > > Hi Group
>> > >
>> > > SYSTEM:
>> > >
>> > > Clustered SQL Server 2000 on Windows 2000 (2 servers), with RAID 1+0,
> 6
>> > > drives (Total of 24 20GB hard drives with mirroring/striping).  4 GB
> RAM
>> /
>> > > Server.  All fiber back end, with Fibre Channel Storage Hub 7 ( @ 100
>> > > MB/sec ).  Our SQL servers have fiber cards @ 400 MB/sec connected to
>> hub.
>> > > About 50 DBases, ranging in size from about 50-100 MB up to 6 GB.
> They
>> > > are
>> > > shared by 15 clients (they are fairly balanced over a 24 hour period,
>> when
>> > > 1
>> > > client is busy, the other isn't).  The total size of dbases is ~ 65
> GB,
>> > > Log
>> > > File ~ 35 GB.  100-250 connected users (to our application), with
>> 300-500
>> > > SQL connections.  Dual XEON 2.8Ghz Processors / Server.
>> > >
>> > > PROBLEM:
>> > >
>> > > We are experiencing disk problems but are having a very difficult 
>> > > time
>> > > diagnosing 'exactly' what is the cause and solution.  On the data
>> drives,
>> > > thru PerfMon, the % Disk Times are averaging between 250 and 1500,
> with
>> > > peaks of over 30,000 (How do I get 30,000 % disk use??).  Average 
>> > > disk
>> > > queues are setting at about 100-250 during the problem times, but
>> > > occasionally see it spike well over 1000.  The transactions/sec hover
>> > > between 50-100.  Processor hovers between 10 and 40%, spiking to
> 70-80%.
>> > > The disk bytes/sec for the data drives average betwwen 1 and 6
>> MBytes/sec.
>> > >
>> > > I know there are a lot of variables that can be at play here, but
> where
>> > > should I start?  Is the system overall too small?  Over worked?? We
> did
>> > > have
>> > > memory issues but increased the RAM and there no longer appears to be
>> > > buffer
>> > > cache problems and we have not yet exceeded the available RAM.  We
> have
>> > > removed most of our temp tables (in favor of table variables) and
>> removed
>> > > a
>> > > lot of the cursors that were in the system.  We moved our reporting
>> server
>> > > to it's own box (completely seperate from prod servers).  We reduced
> the
>> > > log
>> > > file backup intervals and staggered our database backup jobs.  We
>> > > converted
>> > > portions of our reporting to Analysis services.  All with short term
>> > > marginal improvements.  We have to speed our system up before we 
>> > > start
>> > > losing clients.  Does it sound/appear as though the next option would
> be
>> > > to
>> > > scale out?  scale up?  Goto a SAN environment?  And info and advise
>> would
>> > > be
>> > > greatly appreciated!
>> > >
>> > > If you got this far, thanks for the time in reading this long post!!!
>> And
>> > > am
>> > > anxious to read any responses.  Thank you.
>> > >
>> > >
>> >
>> >
>>
>>
>
> 


Relevant Pages

  • Re: Extremely high % Disk Time .. How is this possible??
    ... As Andrew pointed out, it is the physical drives that appear to be limited. ... Slicing them into logical drives may look better but it does not increase IO ... and read bytes/write bytes per second as true measures of disk performance. ... Roughly 30-40 % of ransactions occur in tempDB ...
    (microsoft.public.sqlserver.clustering)
  • Re: How to improve SQL Server 2000 performance?
    ... understand how you were actually inserting these rows. ... Andrew J. Kelly SQL MVP ... >> Andrew J. Kelly SQL MVP ... >>>> disk I/O. ...
    (microsoft.public.sqlserver.setup)
  • Re: How to improve SQL Server 2000 performance?
    ... > Hi Andrew, thanks for the advice. ... Your current and avg disk queues are most likely very high ... How are you inserting these rows? ... >> Andrew J. Kelly SQL MVP ...
    (microsoft.public.sqlserver.setup)
  • Re: Volume/Partition labels
    ... is no reason to turn off the pagefile as Andrew has inaccurately posted. ... Running on a HP Pentium 4 with two hard drives. ... >> BIOS designates the Master as Drive 0 and the Slave as Drive 1. ... >> the Slave as Disk 1. ...
    (microsoft.public.windowsxp.newusers)
  • Re: Filegroups and indexes
    ... you have disk queue issues first. ... Andrew J. Kelly SQL MVP "Leila" wrote in message ...
    (microsoft.public.sqlserver.server)