Re: High Avg. Disk Queue Length When Opening Shared Calendars

Tech-Archive recommends: Fix windows errors by optimizing your registry



Hi John,

Right, mystery solved. The disks were upgraded and nobody was kind enough to
update the documentation!

Disk Configuration: 2 x 73gb disks raid 1 (operating system), 2 x 73gb disks
raid 1 (logs), 1 x 73gb hotspare, 4 x 300gb disks raid 0+1 (database), 1 x
300gb hotspare. Both raid 1 sets on one scsi channel and raid 0+1 set on
other
scsi channel. All 73gb disks are 10k and all 300gb disks are 10k.

- The SMTP directories are on the system drive
- Indexing is enabled on all stores with 'low' setting
- PerfMon (Counter: Avg. Disk Sec/Write on DB volume), collected during
business hours shows an average of 0.185 (185 milliseconds).

So, our 4 drive RAID 10 array consisting of 10K scsi spindles is giving us
297.5 IOPS. I have 340 users using Exchange at the moment of a possible count
of 731 mailboxes. PTA at the same time is reporting high RPC load and this is
the case most of the day!

Do think we have a disks bottleneck on our DB drive? I assume that we would
be a lot better off with more spindles?

Kind Regards,

Daran Clarke.


"John Fullbright" wrote:

Let's assume 10K spindles, a 3:1 read/write ratio, and IOPS/user of 1.

RAID 5 write performance: P*N'/4 = 85*3/4 = 63.75 IOPS
RAID 5 read performance: P*N' = 85*3 = 255

Applying the read/write ratio:

255 *.75 + 63.75*.25 = 191.25 + 15.9375 = 207.1875

That would support about 207 users. How many users are in your environment?


Now let's look at the logs. You have a single RAID 1 with three sets of
logs on the same set of physicals, so essentually you have a random IO
pattern. Read performance for a mirrof of 10K scsi disks is 170 and write
performance is 85. With the assumptions we used for the databases, you log
LUN would be capable of supporting 340 users. As your user count increases
over 200, the database LUN would become a bottleneck prior to the log LUN.

You didn't name the location of the SMTP directories (we'll assume the boot
volume) or the working directory (we'll assume the default, the first
database drive). You also did not state if you had indexing turned on, and
if so what priority level it is set for. The gatherer logs, the message
tracking logs, and the working directory are all located on the first
database drive by default, and would further degrade your database
performance effectively reducing the number of users your system could
support.

Last decade, when 5.5 was state of the art, read/write ratios were closer to
8:1, and RAID 5 could occasionally provide acceptable performance given
enough spindles. Today, with E2K3 and OL2K3 client side caching, read/write
ratios are closer to 3:1 or 2:1. The 4:1 write penaly for RAID 5 makes it
unsuitable for Exchange. You clearly have a potential bottleneck on your
database LUN.

So, hypothesis 1: A disk bottleneck on the database LUN.

To prove or disprove this, collect the perfmon physical disk counter avg
disk sec/write for the e: drive. The Microsoft paper "Optimizing Storage IO
Performance for Exchange Server 2003" states that IO should not "average
more than 20ms or have spikes greater than 50ms lasting more than a few
seconds" When you read the counter, .020 is 20ms. Does the data you
collected meet the MS standard? Yes - you don't have a disk bottleneck.
No - You do have a disk bottleneck. The solution would be to dump the RAID
5 and go RAID 10. With the same 4 spindles, RAID 10 provides write
performance of 170 IOPS and read performance of 340 IOPS. For a 3:1
read/write ratio the mixed performance of a 4 drive RAID 10 array consisting
of 10K scsi spindles is 297.5 IOPS, nearly 50% higher than that of a RAID 5
array with the same number and type of spindles. The price you pay for that
50% performance increase is space. A 4 drive RAID 5 set consisting of 144GB
drives (right sized to 137GB) gives you 411GB. A 4 drive RAID 10 set
consisting of the same 4 spindles gives you 274GB. I count 249GB total
size of databases.

BTW: Given " 4 x 144gb disks raid 5 (database), " and "Drive (E: - DB)
308GB Free" and "Size of first storage group (public folders): 50GB, Size
of second storage group (mailboxes A-L): 93GB, Size of third storage group:
106GB" something definitely isn't adding up. To see the actual sizes of
your databases, dismount and them mount them. This will update the reported
sizes of the .stm and .edb files, as well as the free space available on the
drive.

Johyn














"Daran Clarke" <DaranClarke@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:F2F4DFBD-4272-47CB-A7EA-12BF27D14BBA@xxxxxxxxxxxxxxxx
Hi Nuevo,

Server: Cyclone Dual 3.2ghz Xeon Processor Server, 4 gb ram (4 x 1gb), 5 x
73GB scsi disks, 5 x 144gb scsi disks, Intel Raid SRCU42X controller with
128mb mem.

Disk Configuration: 2 x 73gb disks raid 1 (operating system), 2 x 73gb
disks
raid 1 (logs), 1 x 73gb hotspare, 4 x 144gb disks raid 5 (database), 1 x
144gb hotspare. Both raid 1 sets on one scsi channel and raid 5 set on
other
scsi channel.

Operating system: Windows 2003 Standard Edition (no SP), Hotfixes: 819696,
823182, 823559, 823980, 824105, 824141, 824145, 824146, 825119, 828028,
828035, 828741, 828750, 831464, 832894, 835732, 837001, 837009, 839643,
840374.

Installed Software: Exchange 2003 Service Pack 1, Trend Micro Scanmail For
Exchange Service Pack 3, MOM 2005 Agent, CommVault Agents

Drive (C: - OS) 57GB Free, Drive (D: - Logs) 54GB Free, Drive (E: - DB)
308GB Free
Size Of Logs: 10GB, Size of first storage group (public folders): 50GB,
Size
of second storage group (mailboxes A-L): 93GB, Size of third storage
group:
106GB

--
Kind Regards,

Daran Clarke.


"Nuevo" wrote:

Can you explain how you have your disks setup, raid level, location of
logs,
dbs, os and exchange binaries.

Nue
"Daran Clarke" <DaranClarke@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:DF1CF206-4923-4F9E-8C7C-3EFF5E680E96@xxxxxxxxxxxxxxxx
Hi Everyone,

I'm getting strange performance issues when on occasion (happens more
often
than not) when someone opens a shared calendar (usually one a user has
not
used in a while) I am seeing a high Avg. Disk Queue Length count in
perfmon
(7-8). When the calendar is open, the count drops to around <1 again.
This
is
quite consistant. I am also seeing RPC latency creeping up while this
is
going on. While this is happening, users find that access to mailboxes
etc
is
very slow. Does anyone have any idea on what this could be? I am also
getting
(but only on occasion) quite a lot of these Events: 8206, 8264, 8230 -
again,
issues with calendars.

Kind Regards,

Daran Clarke.






.



Relevant Pages

  • Re: Basic query tracing and profiling question...
    ... L: Oracle Logs 387GB ... # been enabled using ALTER DATABASE ARCHIVELOG. ... tablespace all on ONE disk is a recipe for disaster. ...
    (comp.databases.oracle.server)
  • Re: Disk Configuration for AD domain controllers
    ... need guidance for configuring the disk drives. ... I read that it is recommended to store the AD database and logs on a ... AD is not such a large database as Exchange for example. ... If your RAID controller has a battery backup, it is usually safe to leave write cache enabled on the disk. ...
    (microsoft.public.windows.server.general)
  • Re: High Avg. Disk Queue Length When Opening Shared Calendars
    ... Let's assume 10K spindles, a 3:1 read/write ratio, and IOPS/user of 1. ... Now let's look at the logs. ... the database LUN would become a bottleneck prior to the log LUN. ... A disk bottleneck on the database LUN. ...
    (microsoft.public.exchange.admin)
  • Re: Oracle Performance -- Possible Disk Bottleneck
    ... I've read that 15k drives can perform 180 IO's per second. ... confirm this is a disk bottle neck. ... so I'm hoping someone here has experience with SANs and ORACLE to help ... this is a disk bottleneck and that giving the database more spindles ...
    (comp.databases.oracle.server)
  • Re: Chaotic IMAP Message list
    ... bits of the database which are scattered about on your hard disk into a new ... break at an unfortunate location in the database. ... I was busy and didnšt read carefully so I thought you were advocating the more complete rebuild solution. ... or can you offer a thumbnail explanation of what happened and why a compact would fix it? ...
    (microsoft.public.mac.office.entourage)