Re: Exchange 2007 SCC Cluster - Slow Failover

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



" The disks are on a SAN which would surprise me if
they were the bottleneck, but I will check to be sure."

Doesn't surprise me a bit. A SAN is not a panacea for performance issues,
and must ultimately follow the same basic sizing rules. Can you describe
the SAN and the disk layout?

John


<megan.kielman@xxxxxxxxx> wrote in message
news:c33472eb-97ab-4e57-8c5c-a11a58d9dbed@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Wow! Thank you for the excellent (and entertaining response). We plan
to perform a failover next week. I will capture disk metrics then and
report back. Just so everyone knows, we are running Exchange 2007 SP1
RU4, soon to be RU5. The disks are on a SAN which would surprise me if
they were the bottleneck, but I will check to be sure.

On Jan 13, 11:43 am, "John Fullbright" <fjohn@donotspamnetappdotcom>
wrote:
For an SCC cluster, Exchange 2007 needs to flush the database cache to
disk
prior to moving the mailbox server. In RTM, CCR also flushed the cache
before a move. In SP1, in epic form a lone programmer in the bowels of the
beast feverishly slaving over the keyboard during a marathon moment more
akin to eternity, throat dry as the great Gobi, knees buckling, fighting
the
rapidly encircling darkness. From the darkness, in its blackest moment the
dawn, had a flash of inspiration. "fsck!" the exhausted shell that
formerly
resembled a human being exclaimed. "What's the point in persisting the
cache to a copy of the database that won't be used". And so, in an "oops I
could have had an adult beverage!" moment, an optimization was born. For
CCR SP1 does not persist the cache before moving to the other node and
thus
fails over much faster if you have a large database with a large cache
with
a large number of uncommitted pages all vying to write to that woefully
inadequate three SATA spindle RAID 5 set, grinding and groaning as smoke
pours forth filling the room with it's acrid putridity, as two of the
three
drives, whirring and straining and squealing, like fingernails on a
chalkboard, fail, one after another, like dominos, irretrievably
corrupting
the only sacred copy of the vital business data, as no successful backup
in
the last two years, condemning the lost soul to the eternal void

SSC has to flush uncommitted data to disk before moving the database.
Check
your physical disk counters and you'll see what I mean. You really didn't
provide any information about performance load or capacity, so I took the
liberty of creating an information rich albeit wordy fictional example. If
the goal is to stick with SCC, and a single copy of data in a given
physical
location (do replicate it somewhere else however to prevent the single
point
of failure doomsday scenario), then you may want to consider reconfiguring
your storage. A higher performance disk subsystem, a disk subsystem with a
lower write penalty, isolating databases and logs on sets of spindles
sized
to handle the performance requirement, higher performance spindles; these
are all options you should consider in that case. Another option would be
SP1 and CCR. Either way, sounds like you'll need to spring for some disk.

John

"Megan Kielman" <megan.kiel...@xxxxxxxxx> wrote in message

news:OeDuafZdJHA.5540@xxxxxxxxxxxxxxxxxxxxxxx



All -

We have experienced slow failover times with our SCC cluster. For
example,
we initiate the failover using the PS command
"Move-ClusteredMailboxServer
ClusterName -TargetMachine:<DestinationServerName>" and the Application
log shows Event ID 111 (move requested)and 5 minutes later Event ID 113
with the description "Information Store cache flush before moving
clustered mailbox server 'ServerName' did not complete. After that
event,
the cluster finally begins failing over.

What is the cache flush and why is it failing? Is 5 minutes a reasonable
amount of time for the failover to begin?

Thanks!

Megan- Hide quoted text -

- Show quoted text -


.



Relevant Pages

  • Re: How to verify/fix High Disk Read Latencies in Exch2003 ?
    ... Exchange production servers are, the SAN is an EMC CX600. ... >>> current log file on disk and continues until data in the log buffers ... Comingling occurs whe two or more LUNs reside ...
    (microsoft.public.exchange.admin)
  • Re: HP EVA4000 / IBM DS4300 / EMC CX3-20/40
    ... disk array with the virtual raidsets on top. ... So, the system admin, and the DBAs had to create and manage lots of ... separate LUNs and *manually* manage the performance among them to ... applications on the SAN. ...
    (comp.arch.storage)
  • Re: How to verify/fix High Disk Read Latencies in Exch2003 ?
    ... Do they all map LUNs to your SAN? ... > wondering how the disk could be highly utilized if I have no users ... >> following formulats for the RAID type of each LUN: ...
    (microsoft.public.exchange.admin)
  • Re: Separate Data and Log files
    ... It is quite possible that a SAN can mask the need to separate the files due ... How much relevance is this> documentation with newer SAN technologies where there are large buffer> caches. ... > that the data is written to disk when in actuality it is still in cache> waiting to be written to disk. ...
    (microsoft.public.sqlserver.setup)
  • Re: Storage Area Network
    ... Do you expect less or more space to be taken if you take a backup via ... You did remember to put Exchange logging into ... Backup to disk is a great option for many situations. ... The Cariion SAN system is fine and is a lowcost system. ...
    (microsoft.public.exchange.connectivity)