Re: Extremely high % Disk Time .. How is this possible??

From: Geoff N. Hiten (SRDBA_at_Careerbuilder.com)
Date: 02/10/05


Date: Thu, 10 Feb 2005 09:32:30 -0500

As Andrew pointed out, it is the physical drives that appear to be limited.
Slicing them into logical drives may look better but it does not increase IO
capability. Separating data and logs onto different physical disks with the
correct RAID layouts will increase performance. Andrew also suggested
profiling for performance to find if there are just a few really ugly
queries that are killing your system. I agree completely.

As for the original question on Percent IO Time, that counter has been
largely irrelevent since SCSI Overlapped IO was introduced way back when.
Caching controllers really killed it. I prefer to look at disk queue length
and read bytes/write bytes per second as true measures of disk performance.

I believe that you will be running with write cache off in a non-SAN cluster
to prevent data loss during a cluster failover. This is a big performance
killer and one major reason I do not recommend a cluster without a SAN. All
indicators are that you need to modify your IO system. You may find other
performance bottlenecks as you remove the IO limit, but for right now, that
is the worst offender.

-- 
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Tim" <tim@pds.notanaddress.ca> wrote in message
news:%23kwePe3DFHA.2608@TK2MSFTNGP10.phx.gbl...
> I do have 5 logical drives, 1 for logs and 1 for dbases for each server,
and
> a quorum drive.  Been evaluating the benefits of adding a dedicated array
> for tempDB and indexes.  Roughly 30-40 % of ransactions occur in tempDB
> (still have a lot of tempDbases as it seems to be the most efficient way
to
> maintain/handle the passing of tables between stored procedures.  In many
> cases, there is no easy way to eliminate the passing of the tables as a
> couple of stored procedures (the big offenders) require a large amount of
> processing.
>
>
> "Andrew J. Kelly" <sqlmvpnooospam@shadhawk.com> wrote in message
> news:e65DCOtDFHA.2632@TK2MSFTNGP12.phx.gbl...
> > It sure sounds like your storage array is not able to handle the load.
> > Having a Raid 1+0 is great but only having 6 drives is on the low end.
> > Since you are not on a SAN you only have the cache on the controllers
> > themselves to help deal with all the writes.  Make sure the cache on the
> > controllers is set to 100% write back and not read.  It also sounds like
> you
> > only have a single drive array and everything is on it (logs , data and
> > tempdb).  That also is not good.  You should have at least a Raid1 or
> > Raid1+0 for just the log files.  The spikes can be caused by lots of
> things
> > but you are probably seeing some of this due to checkpoints.  You can
use
> > the Checkpoint pages per second perfmon counter to correlate this to the
> > disk queue spikes.  You might also try using connection pooling instead
of
> > that many individual connections.  You will need multiple pools to
handle
> > the different dbs but it should cut down the actual number of
connections
> > quite a bit.Have you ran any traces to see what kind of queries are
being
> > run and who the worst offenders are?  You may have a few really poorly
> tuned
> > queries that are hurting everything else.
> >
> >
> > -- 
> > Andrew J. Kelly  SQL MVP
> >
> >
> > "Tim" <tim@pds.notanaddress.ca> wrote in message
> > news:%23v7AhlsDFHA.2804@TK2MSFTNGP14.phx.gbl...
> > > Hi Group
> > >
> > > SYSTEM:
> > >
> > > Clustered SQL Server 2000 on Windows 2000 (2 servers), with RAID 1+0,
6
> > > drives (Total of 24 20GB hard drives with mirroring/striping).  4 GB
RAM
> /
> > > Server.  All fiber back end, with Fibre Channel Storage Hub 7 ( @ 100
> > > MB/sec ).  Our SQL servers have fiber cards @ 400 MB/sec connected to
> hub.
> > > About 50 DBases, ranging in size from about 50-100 MB up to 6 GB.
They
> > > are
> > > shared by 15 clients (they are fairly balanced over a 24 hour period,
> when
> > > 1
> > > client is busy, the other isn't).  The total size of dbases is ~ 65
GB,
> > > Log
> > > File ~ 35 GB.  100-250 connected users (to our application), with
> 300-500
> > > SQL connections.  Dual XEON 2.8Ghz Processors / Server.
> > >
> > > PROBLEM:
> > >
> > > We are experiencing disk problems but are having a very difficult time
> > > diagnosing 'exactly' what is the cause and solution.  On the data
> drives,
> > > thru PerfMon, the % Disk Times are averaging between 250 and 1500,
with
> > > peaks of over 30,000 (How do I get 30,000 % disk use??).  Average disk
> > > queues are setting at about 100-250 during the problem times, but
> > > occasionally see it spike well over 1000.  The transactions/sec hover
> > > between 50-100.  Processor hovers between 10 and 40%, spiking to
70-80%.
> > > The disk bytes/sec for the data drives average betwwen 1 and 6
> MBytes/sec.
> > >
> > > I know there are a lot of variables that can be at play here, but
where
> > > should I start?  Is the system overall too small?  Over worked?? We
did
> > > have
> > > memory issues but increased the RAM and there no longer appears to be
> > > buffer
> > > cache problems and we have not yet exceeded the available RAM.  We
have
> > > removed most of our temp tables (in favor of table variables) and
> removed
> > > a
> > > lot of the cursors that were in the system.  We moved our reporting
> server
> > > to it's own box (completely seperate from prod servers).  We reduced
the
> > > log
> > > file backup intervals and staggered our database backup jobs.  We
> > > converted
> > > portions of our reporting to Analysis services.  All with short term
> > > marginal improvements.  We have to speed our system up before we start
> > > losing clients.  Does it sound/appear as though the next option would
be
> > > to
> > > scale out?  scale up?  Goto a SAN environment?  And info and advise
> would
> > > be
> > > greatly appreciated!
> > >
> > > If you got this far, thanks for the time in reading this long post!!!
> And
> > > am
> > > anxious to read any responses.  Thank you.
> > >
> > >
> >
> >
>
>


Relevant Pages

  • Re: Extremely high % Disk Time .. How is this possible??
    ... Andrew J. Kelly SQL MVP ... > Slicing them into logical drives may look better but it does not increase ... I prefer to look at disk queue ... >>> Andrew J. Kelly SQL MVP ...
    (microsoft.public.sqlserver.clustering)
  • Re: Volume/Partition labels
    ... is no reason to turn off the pagefile as Andrew has inaccurately posted. ... Running on a HP Pentium 4 with two hard drives. ... >> BIOS designates the Master as Drive 0 and the Slave as Drive 1. ... >> the Slave as Disk 1. ...
    (microsoft.public.windowsxp.newusers)
  • Re: 3B2 Disks
    ... being able to read the disk in its present format. ... 2 MFM drives on a custom controller. ... SCSI came much later as an add on card. ...
    (comp.sys.3b1)
  • Re: OT-True Image Backup
    ... Make sure the USB drives file system is NTFS and not FAT32.. ... Jaymon is correct in that the disk image you create and save to the D: ... partition of your USB external HDD will have no effect on the other ... Step-by-Step Instructions for Using the Acronis True Image Program to Backup ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: OT-True Image Backup
    ... Make sure the USB drives file system is NTFS and not FAT32.. ... Jaymon is correct in that the disk image you create and save to the D: ... partition of your USB external HDD will have no effect on the other ... Since the disk images you will be creating (at least the initial backup ...
    (microsoft.public.windowsxp.help_and_support)