Re: Sizing Exchange Transaction, SMTP, MTA and Quorum Drives???
- From: TonyP <TonyP@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 2 Aug 2007 08:00:03 -0700
Hi John
I added some comments etc below on the areas you have mentioned I am unsure
about:
"John Fullbright" wrote:
For shared disks:
1. Put the DTC and the quorum on the same drive.
2. Put the databases for each storage group on a drive (4 dstorage groups 4
drives). This will give you a granularity of restore at the SG level.
3. Put the transaction logs for each storage group on their own drive (4
storage groups 4 drives).
4. Put the SMTP directories on a drive. It's important for the SMTP queues
to be on shared disk in 2003. If you leve it local, every time you fail
over you will strand messages. When you failback they will mysteriously
reappear.. maybe days or months later.
5. The MTA run and database directories are on the first database drive
path by default. Unless you have some abnormally high MTA activity (mixed
mode with 5.5 and this is a bridgehead for example) leave it there.
Ok, you have a theoretical .85 IOPS/user. Measure a test group or a subset
of your user base. 50% concurrency? The concurrency ratio is a very common
pitfall. Unless your users are on shifts, in the air and can't access a
computer, etc.., you'll get burned. Along will come the end of the of the
quarter, everyone will be burning the midnight oil, you'll have 90% of your
user on at once, performance will dive, and key staff will miss end of
quarter reporting deadlines. Result: you'll be looking for a new job.
Assume 100% unless it's physically impossible. Measure (over a long
period - several quarters - and use the peak value+ 20%) if you'll be using
anything less. Read/wite ratio is important also.
Concurrency is an Issue, they are assuming 50 percent based on a present
4-node AAAP cluster they have which hosts rougly 25,000 users , 8000 users
per node. They are informing me that only 50 percent are concurrent at any
one time.
I have monitored a serious of Perfmon counters on a montly basis and on a
daily basis also when I look at the MSExchangeIS counter "Active User Count"
on the busiest node it approaches 10,000 users even though the node holds
only 8000 users.
I am assuming 100 percent concurrency for the Storage Design.
Internal Technical lead in the team is providing input saying it is
MSExchangeIS counter "User Account" which is important showing 50 percent
concurrency?
They are saying some users are showing more than one session to the
information store?
Hence they are saying User Account is a more accurate figure to use then
Active User Account which shows some users have more than one connection?
RAID 10 has a write penalty of 2, so the impact of increases in the percentage of writes is
amplified. Assume 2:1 in Exchange 2003 with Outlook cached mode clients.
Assume 3:1 if the outlook clients are not cached. Make sure you add in IOPS
for online maintenace and backups.
Whe you build your arrays, make sure the IOPS/spindle number is @ 20ms
response time or less. Sure, a 15K spindle can reach a maximum IOPS of 300
or so, but the response time at that level can me measured in seconds. You
want an average response time less than 20 ms with no spikes greater than
50ms lasting more than a few seconds. For 4K random IOs, use the following:
10K RPM SCSI - 90 IOPS/spindle @ 20ms response time
15K RPM SCSI - 130 IOPS/Spindle @ 20ms response time
7200 RPM SATA - 40 IOPS/Spindle @ 20 ms response time
Where P is the performance of a single spindle, and N is the number of
spindles in the RAID set, for Raid 1/10,
Read performance = P*N
Write performance = P*N/2
So if I have 4 10K SCSI drives in a RAID 10 array,
Read perfformance = 360 IOPS
Write performance = 180 IOPS
Applying a 2:1 read write ratio, the composite performance is
(360+360+180)/3 = 300 IOPS.
NOTE: Just as a comparative reference point, a RAID 5 array with the same 4
disks would have a composite performance of 201 IOPS; that's why you don'y
use RAID 5. At a 1:1 read/write ratio, RAID 5 has less than half the
performance of RAID 10, so don't consider it in an Exchange 2007 solution
either.
I am VERY confused on this area about COMPOSITE performance I don't know the
number of spindles I require YET?
I am trying to work out the number of drives (spindles) required to meet my
performance needs?
John I used this article before your post and followed out the below:
http://www.petri.co.il/sizing_exchange_part_2.htm
(number of disks) = (IOPS/mailbox × number of mailboxes) ÷ (IOPS/disk × RAID
penalty factor)
Raid 10 Penalty = (R + W)/(R + 2W)
Again since I have no sound statistical data due to latency on the present
4-node cluster I will assume 3 Reads for every 1 Write since Outlook clients
are not cached
Raid 10 Penalty = ( 3+1 ) / ( 3+2(1) )
Raid 10 Penalty = 0.8
Hence
So each Storage group which host 1875 users will need
= IOPS/mailbox * number of mailboxes
= 0.84 * 1875
= 1575 IOPS
Recommended to handle spikes we add a 20 percent buffer to the storage
design to handle these peaks:
Peak Storage Group DB IOPS
= 1575 * 20%
= 1890
Now standard calculation:
(Number of disks) = (IOPS/mailbox * number of mailboxes) / (IOPS/disk at
20ms * RAID penalty factor)
Number of disks = 1890 / 130 * 0.8
Number of disks = 1890 / 104
Number of disks = 18.18
Since we are using Raid 10 we must round up to the nearest even number.
Number of disks = 20
Thus
Number of disks required per Storage Group to host 1875 users is 20 15K RPM
SCSI Drive in Raid 10
Database Storage Group size = number of users * mailbox size
Client has defined the mailbox size to 180MB.
Hence
Database Size = 1875 * 180 MB
= 329.59 GB
So each Database Storage Group is required to be no bigger than 330 GB
Disks are 146GB in size recommended by HP
Total Storage generated to accommodate our Performance for the Storage Group
comes to
Total Storage for Performance = 10 disk * 146 GB
= 1460 GB
Note: 20 Disk in Raid 10 to meet IOPS requirement, hence 10 discs available
for storage
Our previous Capacity figure above suggests we only need 330 GB per Storage
Group.
But assigning 1460 GB for each LUN using traditional Storage methods leads
to a huge waste in disk storage space.
Virtualized Storage techniques we can create 4 LUNS from our 10 Disks:
= 1460 GB / 4
= 365 GB – size of each LUN
Hence we meet our Capacity requirement since each LUN created is greater
than 330 GB and also we effectively use our 10 disks more efficiently.
Previous figures we calculated for performance related IOPS was based on all
physical spindles within each LUN created dedicated to the Storage Group.
Since the physical spindles are NOT now dedicated to a single LUN but are
shared amongst 4 LUNs is a loss in Performance IOPS?
So is each LUN carved from the array set now is not giving me the
performance I defined for 1875 users - 1890 IOPS??
Is this a case of Comingling where IO against one LUN negatively impacts the
performance of other LUNs that share the same physical spindles?
I am now seriously concerned about my reasoning since you have talked about
"P is the performance of a single spindle, and N is the number of spindles in
the RAID set" and working out composite performance etc???
To determine the IOPS for the transaction logs, you divide the database IOPS
by anywhere from 8 to 12, with 10 being common. I tend to use the 8 figure
to stay on the conservative side. The size of the log lun is another story.
What is your average 24 hour change delta. What is the peak? If you collect
change delta information over a period of time, what is the mean and the
stadard deviation of the dataset?
How are you collecting this data? Perfmon counters?
How often do you do a full backup and trucante the logs? What is your backup failure tolerance in days (how long
should the system stay operational if backups starts failing? Generally 4-7
days to cover a long weekend and troubleshooting)? What level of
reliability is required? The answers to these questions will tell you how
large to make the log LUNs. For eaxmple, let's assume:
1. Mean change delta 9GB
2. SD of sample set 1GB
3. Backup failure tolerance 7 days.
4. 99.9% repliability
We start with the mean change delta, then add enough standard deviations to
reach or exceed the required level of reliability (3 in this case), so our
Change delta size is 9+(1*3) = 12GB. Now, we take this figure and multiply
by our backup failure tolerance and our LUN is 84GB. I can say with
99.97559% accuracy that an 84GB LUN will withstand 7 consecutive days of
backup failure before the drive fills and the stores dismount.
You can take a similar statistical approach to sizing the SMTP LUN; take a
sample set of max size of a long collection period.
How are you collecting this data? Perfmon counters?
Figure out the mean and SD, the add enough standard deviations to the mean to reach the desired.
level of reliability. A lot of folks don't bother, and just allocate an
overly large disk (50 - 100GB) to cover normal traffic and any potential
loops/chain mails/store outages/etc.. without going offline. I believe
Optimizing Storage for Exchange Server 2003 says 500 IOPS, however, I would
measure. The number of IOs depends on the number of messages, message
sizes, destination, retries, etc. On average, the categorizer touches an
eml file in the queue directory 7 times.
"TonyP" <TonyP@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:3A3EFF63-5E9C-4E58-996F-CF2137F7D4FB@xxxxxxxxxxxxxxxx
Hi
Currently trying to design a 3 node cluster comprising of 2 Active nodes
and
1 Passive node. Exchange 2003 environment
Will have 4 Storage Groups per Node which will have there only dedicated
drives.
Transaction logs for each transaction drive will also have there own drive
letters
As will SMTP , MTA and Quorum drives all on separate drive letters will
LUNs
carved out of the SAN
Used a theoretical value for IOPS per user as 0.85 and user mailbox limits
where decided as 180 MB.
Each Storage Group will hold 1875 users, did the standard calculation to
to
work out size for each Database drives. Hence each node holds 7500 users
but
there is only 50 percent concurrency.
Database Drives are in RAID 10
Transaction drives where taken as 1/10 of IOPS requirement of Database
Drives , will also be in RAID 10
How do you determine a safe size for the Transaction Log drives?
Also what is the standard calculation to work out
SMTP drive size?
MTA drive size?
Quorum size ?
Would you use Raid 1 for the SMTP, MTA and Quorum? what about IOPS for
these
also?
help greatly appreciated as always
Tony
- Follow-Ups:
- Re: Sizing Exchange Transaction, SMTP, MTA and Quorum Drives???
- From: John Fullbright
- Re: Sizing Exchange Transaction, SMTP, MTA and Quorum Drives???
- Prev by Date: 2003 -> 2007 cluster, looking for some advice
- Next by Date: Exchange Clients
- Previous by thread: 2003 -> 2007 cluster, looking for some advice
- Next by thread: Re: Sizing Exchange Transaction, SMTP, MTA and Quorum Drives???
- Index(es):
Relevant Pages
|