Re: High Avg. Disk Queue Length When Opening Shared Calendars
- From: Daran Clarke <DaranClarke@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 3 Apr 2006 18:05:01 -0700
Hi John,
Right, mystery solved. The disks were upgraded and nobody was kind enough to
update the documentation!
Disk Configuration: 2 x 73gb disks raid 1 (operating system), 2 x 73gb disks
raid 1 (logs), 1 x 73gb hotspare, 4 x 300gb disks raid 0+1 (database), 1 x
300gb hotspare. Both raid 1 sets on one scsi channel and raid 0+1 set on
other
scsi channel. All 73gb disks are 10k and all 300gb disks are 10k.
- The SMTP directories are on the system drive
- Indexing is enabled on all stores with 'low' setting
- PerfMon (Counter: Avg. Disk Sec/Write on DB volume), collected during
business hours shows an average of 0.185 (185 milliseconds).
So, our 4 drive RAID 10 array consisting of 10K scsi spindles is giving us
297.5 IOPS. I have 340 users using Exchange at the moment of a possible count
of 731 mailboxes. PTA at the same time is reporting high RPC load and this is
the case most of the day!
Do think we have a disks bottleneck on our DB drive? I assume that we would
be a lot better off with more spindles?
Kind Regards,
Daran Clarke.
"John Fullbright" wrote:
Let's assume 10K spindles, a 3:1 read/write ratio, and IOPS/user of 1..
RAID 5 write performance: P*N'/4 = 85*3/4 = 63.75 IOPS
RAID 5 read performance: P*N' = 85*3 = 255
Applying the read/write ratio:
255 *.75 + 63.75*.25 = 191.25 + 15.9375 = 207.1875
That would support about 207 users. How many users are in your environment?
Now let's look at the logs. You have a single RAID 1 with three sets of
logs on the same set of physicals, so essentually you have a random IO
pattern. Read performance for a mirrof of 10K scsi disks is 170 and write
performance is 85. With the assumptions we used for the databases, you log
LUN would be capable of supporting 340 users. As your user count increases
over 200, the database LUN would become a bottleneck prior to the log LUN.
You didn't name the location of the SMTP directories (we'll assume the boot
volume) or the working directory (we'll assume the default, the first
database drive). You also did not state if you had indexing turned on, and
if so what priority level it is set for. The gatherer logs, the message
tracking logs, and the working directory are all located on the first
database drive by default, and would further degrade your database
performance effectively reducing the number of users your system could
support.
Last decade, when 5.5 was state of the art, read/write ratios were closer to
8:1, and RAID 5 could occasionally provide acceptable performance given
enough spindles. Today, with E2K3 and OL2K3 client side caching, read/write
ratios are closer to 3:1 or 2:1. The 4:1 write penaly for RAID 5 makes it
unsuitable for Exchange. You clearly have a potential bottleneck on your
database LUN.
So, hypothesis 1: A disk bottleneck on the database LUN.
To prove or disprove this, collect the perfmon physical disk counter avg
disk sec/write for the e: drive. The Microsoft paper "Optimizing Storage IO
Performance for Exchange Server 2003" states that IO should not "average
more than 20ms or have spikes greater than 50ms lasting more than a few
seconds" When you read the counter, .020 is 20ms. Does the data you
collected meet the MS standard? Yes - you don't have a disk bottleneck.
No - You do have a disk bottleneck. The solution would be to dump the RAID
5 and go RAID 10. With the same 4 spindles, RAID 10 provides write
performance of 170 IOPS and read performance of 340 IOPS. For a 3:1
read/write ratio the mixed performance of a 4 drive RAID 10 array consisting
of 10K scsi spindles is 297.5 IOPS, nearly 50% higher than that of a RAID 5
array with the same number and type of spindles. The price you pay for that
50% performance increase is space. A 4 drive RAID 5 set consisting of 144GB
drives (right sized to 137GB) gives you 411GB. A 4 drive RAID 10 set
consisting of the same 4 spindles gives you 274GB. I count 249GB total
size of databases.
BTW: Given " 4 x 144gb disks raid 5 (database), " and "Drive (E: - DB)
308GB Free" and "Size of first storage group (public folders): 50GB, Size
of second storage group (mailboxes A-L): 93GB, Size of third storage group:
106GB" something definitely isn't adding up. To see the actual sizes of
your databases, dismount and them mount them. This will update the reported
sizes of the .stm and .edb files, as well as the free space available on the
drive.
Johyn
"Daran Clarke" <DaranClarke@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:F2F4DFBD-4272-47CB-A7EA-12BF27D14BBA@xxxxxxxxxxxxxxxx
Hi Nuevo,
Server: Cyclone Dual 3.2ghz Xeon Processor Server, 4 gb ram (4 x 1gb), 5 x
73GB scsi disks, 5 x 144gb scsi disks, Intel Raid SRCU42X controller with
128mb mem.
Disk Configuration: 2 x 73gb disks raid 1 (operating system), 2 x 73gb
disks
raid 1 (logs), 1 x 73gb hotspare, 4 x 144gb disks raid 5 (database), 1 x
144gb hotspare. Both raid 1 sets on one scsi channel and raid 5 set on
other
scsi channel.
Operating system: Windows 2003 Standard Edition (no SP), Hotfixes: 819696,
823182, 823559, 823980, 824105, 824141, 824145, 824146, 825119, 828028,
828035, 828741, 828750, 831464, 832894, 835732, 837001, 837009, 839643,
840374.
Installed Software: Exchange 2003 Service Pack 1, Trend Micro Scanmail For
Exchange Service Pack 3, MOM 2005 Agent, CommVault Agents
Drive (C: - OS) 57GB Free, Drive (D: - Logs) 54GB Free, Drive (E: - DB)
308GB Free
Size Of Logs: 10GB, Size of first storage group (public folders): 50GB,
Size
of second storage group (mailboxes A-L): 93GB, Size of third storage
group:
106GB
--
Kind Regards,
Daran Clarke.
"Nuevo" wrote:
Can you explain how you have your disks setup, raid level, location of
logs,
dbs, os and exchange binaries.
Nue
"Daran Clarke" <DaranClarke@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:DF1CF206-4923-4F9E-8C7C-3EFF5E680E96@xxxxxxxxxxxxxxxx
Hi Everyone,
I'm getting strange performance issues when on occasion (happens more
often
than not) when someone opens a shared calendar (usually one a user has
not
used in a while) I am seeing a high Avg. Disk Queue Length count in
perfmon
(7-8). When the calendar is open, the count drops to around <1 again.
This
is
quite consistant. I am also seeing RPC latency creeping up while this
is
going on. While this is happening, users find that access to mailboxes
etc
is
very slow. Does anyone have any idea on what this could be? I am also
getting
(but only on occasion) quite a lot of these Events: 8206, 8264, 8230 -
again,
issues with calendars.
Kind Regards,
Daran Clarke.
- Prev by Date: Re: 550 5.7.1 Unable to relay for user@domain.com
- Next by Date: Re: Name no longer resolved; not in GAL
- Previous by thread: Re: High Avg. Disk Queue Length When Opening Shared Calendars
- Next by thread: Blocking Port 25
- Index(es):
Relevant Pages
|