Re: Cluster crashed, need recovery advice
- From: "John Toner [MVP]" <jtoner@xxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 20 Apr 2005 09:41:52 -0400
When you "Shut down cluster service" you should also make sure that you
disable the cluster disk driver. You should also make sure that you only
have one node powered on when you're creating the disk. I'd modify your
procedure as follows:
- First make absolutely sure I have a good backup (duh)
- Remove those 2 fail disks from the Cluster Config.
- Shut the cluster service down *** and disable cluster disk driver ***
- Remove the disks from the Windows device manager.
- shutdown this node. startup the other node (without starting the cluster
service there) and remove the 2 disk there as well.
- *** Disable Cluster Disk driver and *** shutdown that node.
- On the SAN: Lowlevel format those 2 disks.
- Startup the first cluster node without enabling the cluster service
- Add the disks back to Windows, format, assign drive-letter
- *** re-enable cluster disk driver then power down first node ***
- Startup the second cluster node without enabling the cluster service
- Add the disks back to Windows, assign drive-letter
- *** re-enable cluster disk driver then shut down this node ***
- Restart the cluster service on node 1and re-attach the 2 disks.
- Restart the cluster service on node 2
- Failover of cluster group 3 to node 2.
Regards,
John
"Tonny van Geloof" <tvNOgeloof@xxxxxxxxxxx> wrote in message
news:hhva61duuh42142c8175cp0hs30opu6mtr@xxxxxxxxxx
> Hello everybody
>
> I've got a serious problem with a W2K cluster.
> I hope there is someone out there who can offer some advice.
>
> I've got a SAN with 11 logical disks defined, which are accessed by a
> cluster of 2 physical nodes.
> 1 disk is quorum, other 10 are data disks.
> 3 cluster groups:
> 1: only quorum drive
> 2: 5 of the datadisks,
> 3: The other 5 datadisks
> Normally node 1 runs cluster group 1+2, and node 2 runs cluster group
> 3 for load balancing purposes.
>
>
> Couple of days ago the Cluster Service itself crashed on node 2
> causing a failiover of cluster group 3 to node 1.
> So far no harm done.
> When I noticed this (it happend overnight) I decided to reboot node 2.
> Assuming everything would come back online, I would just have to do
> a failover on cluster group 3.
> To my horror the Cluster Service on node 1 died when node tried to
> rejoin the cluster.
> All 11 disks were gone.
>
> After a lot of searching I figured out that somehow the administration
> in the "Cluster Disk Device Driver" had gotten screwed.
> Disablling that driver gave me back the disks and all data turned out
> to be save.
> Clearing the "Signatures" key in the registry and restarting
> clusdisk.sys I still have access to those disks, as long as I do that
> on 1 node while the other node is shut down. As soon as the second
> node comes back online all disks immediatly disappear.
> So I can run on one node only.
> When I restart the Cluster Service all everything works but 2 of the
> datadisks remain in fail status in the Cluster Admin, allthough those
> disks show up under "My Computer" and are accessable without any
> problems.
> Eventually I just copied the content of those 2 disks to other
> locations on the disks and I changed the various cluser-shares to
> point to the new locations.
>
> So I'm back online, with limited performance and no redundancy.
>
> I will have to recover somehow...... I came up with the following
> strategy:
>
> - First make absolutely sure I have a good backup (duh)
> - Remove those 2 fail disks from the Cluster Config.
> - Shut the cluster service down
> - Remove the disks from the Windows device manager.
> - shutdown this node. startup the other node (without starting the
> cluster service there) and remove the 2 dissk there as well.
> - shutdown that node.
> - On the SAN: Lowlevel format those 2 disks.
> - Startup the first cluster node without enabling the cluster service
> - Add the disks back to Windows, format, assign drive-letter
> - Startup the second cluster node without enabling the cluster service
> - Add the disks back to Windows, assign drive-letter
> - Restart the cluster service on node 1and re-attach the 2 disks.
> - Restart the cluster service on node 2
> - Failover of cluster group 3 to node 2.
>
> If anything goes wrong again I should be able to get at least back to
> the 1 node cluster config I'm now running on.
>
> Any comments, suggestions, bright ideas, advice are welcome.
>
>
>
>
.
- Follow-Ups:
- Re: Cluster crashed, need recovery advice
- From: Tonny van Geloof
- Re: Cluster crashed, need recovery advice
- References:
- Cluster crashed, need recovery advice
- From: Tonny van Geloof
- Cluster crashed, need recovery advice
- Prev by Date: powermeter not responding
- Next by Date: Message translation
- Previous by thread: Cluster crashed, need recovery advice
- Next by thread: Re: Cluster crashed, need recovery advice
- Index(es):
Relevant Pages
|