Re: Failed cluster node confusion!



Normally the second node would take ownership of all your groups and
resources.
To me this means that something is not working properly.

i would start by analyzing the following on the surviving node around the
time of failure of the first node
1) any clues / errors in the system event log
2) any clues / errors in the cluster.log (note cluster.log is in GMT
timezone, not "host" time)

Blue exclamation marks usually means that the cluster service has terminated
(on node 2).
This usually means it cannot arbitrate for the Quorum disk.
But please note the word "usually" ... only analysis will tell us for sure

rgds,
edwin.


"SW" <siwilson@xxxxxxxxx> wrote in message
news:bd6d5576-5e6b-450f-b514-d56e5278faf2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hello All!

Hope someone can help me with an issue I have with 3 shared disk
clusters I have...

I was doing some tests with a Windows Server 2003 SP1 2 node cluster
which is connected to an HP EVA SAN for storage. Everything is
redundant including the HBA cards, NICs (they are teamed) etc etc. The
heartbeat is configured to go over a cross over cable between the two
nodes. If this fails then the heartbeat will go over the teamed NIC
connection.

The client asked me to do some testing to see how reslient the
solution was. All went really well when doing the following tests:

1) Shuting down one of the nodes
2) Unplugging one or both of the ethernet cables from the teamed NIC

When doing the above test, everything failed over to the remaining
working node correctly.

The problem came when simulating a catastrophic failure by literally
unplugging one of the nodes power supplies (both of them). When I did
this, the second node did NOT failover the resources. Can someone
explain why this is the case? We discovered this by accident when one
of our nodes blue screened and the resources didn't failover. Is this
a design limitation? When I opened Cluster Administrator all the nodes
and resources had a blue exlamation point through them. How does one
get the reources to automatically (or even manually) failover to the
working node when one node has completely died (blue screen, hardware
failure, etc)?? What is the correct procedure to follow when one of
the nodes in a cluster completely fails? Does the remaining working
node have to "seize" the resources?

I can't seem to find anything regarding this issue on the net.

Any help will be much appreciated! ;-)



.



Relevant Pages

  • Re: Unable to failover when public network cables are removed
    ... determining node failure and subsequent resource failover. ... > Can you provide a little more information here......Is the Public Network ... There is only 1 standard cluster group. ... There is no other resources. ...
    (microsoft.public.windows.server.clustering)
  • Re: file server clustering
    ... During the failover I get a message 'Delayed Write Failed'. ... You will have to declare all shares (cluster resources in this case, ... > I am building a clustered server that only provide drive access... ...
    (microsoft.public.windows.server.clustering)
  • Re: A few questions on Failover
    ... What I have is 2 node failover cluster with SQL on it. ... one with SQL resources and one with Cluster's own resources. ...
    (microsoft.public.windows.server.clustering)
  • Re: Changing Node & Virtual IPs for Print Server Cluster
    ... It's a clustered print server. ... Cluster Node 1 Name: cluster1a.domain.com ... Move all resources to Node A. ... IP addresses for all virtual servers including the cluster itself. ...
    (microsoft.public.windows.server.clustering)
  • Re: Exch 2003 SP2 - applied on one node, but cant move resources
    ... resources to Node2, the failover did not complete because 'system attendant' ... Virtual Exchange server and failover occurred normally again upon taking ... cluster resources oline. ...
    (microsoft.public.exchange.admin)