Re: Exchange 2007 SCC Cluster - Slow Failover



For an SCC cluster, Exchange 2007 needs to flush the database cache to disk
prior to moving the mailbox server. In RTM, CCR also flushed the cache
before a move. In SP1, in epic form a lone programmer in the bowels of the
beast feverishly slaving over the keyboard during a marathon moment more
akin to eternity, throat dry as the great Gobi, knees buckling, fighting the
rapidly encircling darkness. From the darkness, in its blackest moment the
dawn, had a flash of inspiration. "fsck!" the exhausted shell that formerly
resembled a human being exclaimed. "What's the point in persisting the
cache to a copy of the database that won't be used". And so, in an "oops I
could have had an adult beverage!" moment, an optimization was born. For
CCR SP1 does not persist the cache before moving to the other node and thus
fails over much faster if you have a large database with a large cache with
a large number of uncommitted pages all vying to write to that woefully
inadequate three SATA spindle RAID 5 set, grinding and groaning as smoke
pours forth filling the room with it's acrid putridity, as two of the three
drives, whirring and straining and squealing, like fingernails on a
chalkboard, fail, one after another, like dominos, irretrievably corrupting
the only sacred copy of the vital business data, as no successful backup in
the last two years, condemning the lost soul to the eternal void


SSC has to flush uncommitted data to disk before moving the database. Check
your physical disk counters and you'll see what I mean. You really didn't
provide any information about performance load or capacity, so I took the
liberty of creating an information rich albeit wordy fictional example. If
the goal is to stick with SCC, and a single copy of data in a given physical
location (do replicate it somewhere else however to prevent the single point
of failure doomsday scenario), then you may want to consider reconfiguring
your storage. A higher performance disk subsystem, a disk subsystem with a
lower write penalty, isolating databases and logs on sets of spindles sized
to handle the performance requirement, higher performance spindles; these
are all options you should consider in that case. Another option would be
SP1 and CCR. Either way, sounds like you'll need to spring for some disk.


John

"Megan Kielman" <megan.kielman@xxxxxxxxx> wrote in message
news:OeDuafZdJHA.5540@xxxxxxxxxxxxxxxxxxxxxxx
All -

We have experienced slow failover times with our SCC cluster. For example,
we initiate the failover using the PS command "Move-ClusteredMailboxServer
ClusterName -TargetMachine:<DestinationServerName>" and the Application
log shows Event ID 111 (move requested)and 5 minutes later Event ID 113
with the description "Information Store cache flush before moving
clustered mailbox server 'ServerName' did not complete. After that event,
the cluster finally begins failing over.

What is the cache flush and why is it failing? Is 5 minutes a reasonable
amount of time for the failover to begin?

Thanks!

Megan


.



Relevant Pages

  • Re: Event ID: 13512 ???
    ... the Active Directory will disable the disk ... If such cache is discovered as being active, ... database may be damaged if power to the drive is lost. ... Write Caching feature and if there are provisions for a power loss (such as ...
    (microsoft.public.win2000.general)
  • Re: O_DIRECT question
    ... writes to a file and don't pollute cache memory without using O_DIRECT? ... It's why database people like it, ... that makes sure that different people doing allocations and deallocations ... wrong data - including seeign uninitialized portions of the disk etc etc. ...
    (Linux-Kernel)
  • Re: What is the complexity of find_by_name ?
    ... would be 2-15 milliseconds per database call... ... Next, SQL must parse and optimize the query, since Rails doesn't make ... normally complete the cache flushing mentioned above. ... Then, to execute the search requires a logsearch through disk pages, ...
    (comp.lang.ruby)
  • Re: Cache-Size vs Performance
    ... logarithmic decrese in the miss rate as the cache size grows ... in big database applications ... ... where the database uses real storage to compensate for disk record ... database people in stl/bldg90 and the relational/sql system/r people ...
    (comp.arch)
  • Re: I need to defrag my Exchange database and need some advice using eseutil
    ... On top of what Ed says, you thought about moving your database to it's own ... disk and off the C:\ drive. ... Oliver ...
    (microsoft.public.exchange.admin)

Loading