Re: Cluster group doesn't come back online after network failure
- From: "bill" <belgie@xxxxxxxxxxx>
- Date: Wed, 4 May 2005 14:23:47 -0400
Thanks
Can you give me some tips on writing the script you suggested?
-Bill
"John Toner [MVP]" <jtoner@xxxxxxxxxxxxxxxxxxxxx> wrote in message
news:%23Vs4G4MUFHA.1896@xxxxxxxxxxxxxxxxxxxxxxx
> Bill,
>
> I would suggest reading the following KB article:
>
> Network failure detection and recovery in Windows Server 2003 Clusters
> http://support.microsoft.com/kb/286342
>
> After the resource reaches its thresholds, it will remain in a failed
state
> until you retry the resource manually. You can adjust the "Threshold"
> setting of the IP resource to a larger number if you want to have MSCS
retry
> a few more times, though this will also delay the amount of time it takes
to
> failover the group for a network link failure on an individual host.
>
> Another alternative would be to write a script that monitors the health of
> your network links and automatically brings cluster resources online if
> links are up.
>
> Regards,
> John
>
> "bill" <belgie@xxxxxxxxxxx> wrote in message
> news:eTWUeJMUFHA.1944@xxxxxxxxxxxxxxxxxxxxxxx
> > The heartbeat cable is fine, because if I disconnect it I get a message
> > that the cable is disconnected. Also, I can ping and tracert the
private
> > nic IP on the other node.
> >
> > I have checked the settings you mentioned, and other settings mentioned
> in
> > Q258750.
> >
> > If I disconnect both ethernet cables from the Public NICS on the nodes
and
> > reconnect one or both of them within about 10 seconds, the cluster group
> > comes back on line. If I wait any longer, it doesn't come back online
> > automatically. Under the Groups folder, the Cluster Group status
remains
> > 'Failed', and under Resources, Cluster IP Address remains 'Failed'.
> >
> > After I reconnect the cables, I can manually bring the cluster group on
> > line.
> >
> > However, I have noticed that after I disconnect both cables for awhile
and
> > manually bring the cluster group online, the nodes no longer failover if
I
> > disconnect the cable of the active node. The cluster group returns to
> > Failed status and stays there. I have to reboot the cluster to restore
> the
> > failover capability.
> >
> > I expect I have overlooked something important, but I can't find it. If
> you
> > have any ideas, please let me know, because the cluster is useless in
its
> > present state. If the network fails briefly, or is shut down for
> > maintenance in the middle of the night or the weekend, the cluster has
to
> be
> > manually brought online and rebooted.
> >
> > Thanks!
> > -Bill
> >
> > Bill,
> >
> > To verify that your heartbeat is working you could disconnect it from
one
> > node and see if the other node reports the loss of connection. You
should
> > verify the properties of the heartbeat to make sure it is set for
internal
> > communcations only in cluster admin. The "public" connection properties
> > should be set to both internal and public, this way if your heartbeat
> > connection fails it will go accross the public.
> > Remember for the heratbeat to work the cable needs to be a "crossover"
> cable
> > not a straight through "standard" network cable if they are directly
> > connected server to server. If it goes through a hub then standard
cables
> > will work.
> >
> > Hope that helps!
> > --
> > Mark
> >
> >
> > "bill" wrote:
> >
> > > Our Win2K3 cluster group stays off line after a network failure - even
> > after
> > > the network service is restored.
> > >
> > > For example, if I disconnect the Public network cables from both nodes
> > > (simulating a network failure), and then reconnect the cables, I have
to
> > > manually bring the Cluster group and Cluster IP address online. They
> > don't
> > > automatically come back on line.
> > >
> > > If i just disconnect one cable, the cluster fails over without a
> problem.
> > >
> > > The heartbeat seems OK - at least, the icon indicates the nodes are
> > > communicating over the Private network ok.
> > >
> > > I would appreciate any suggestions.
> > >
> > >
> > >
> >
> >
>
>
.
- Follow-Ups:
- Re: Cluster group doesn't come back online after network failure
- From: John Toner [MVP]
- Re: Cluster group doesn't come back online after network failure
- References:
- Cluster group doesn't come back online after network failure
- From: bill
- Re: Cluster group doesn't come back online after network failure
- From: bill
- Re: Cluster group doesn't come back online after network failure
- From: John Toner [MVP]
- Cluster group doesn't come back online after network failure
- Prev by Date: RE: Second node with lost communication
- Next by Date: Re: Cluster group doesn't come back online after network failure
- Previous by thread: Re: Cluster group doesn't come back online after network failure
- Next by thread: Re: Cluster group doesn't come back online after network failure
- Index(es):
Relevant Pages
|