Re: Cluster group doesn't come back online after network failure

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Here's a _really_ basic batch script:
..............................................
@echo off

:TOP

ping <insert IP address>

if ERRORLEVEL 1 goto ONLINE

goto TOP

:ONLINE

cluster group "<insert cluster group name>" /on

goto TOP
..............................................

Obviously your scripts can get much more complex/intelligent than this, but
this is just to get you started.

Regards,
John


"bill" <belgie@xxxxxxxxxxx> wrote in message
news:%23Tbk7ZNUFHA.3392@xxxxxxxxxxxxxxxxxxxxxxx
> Thanks
>
> Can you give me some tips on writing the script you suggested?
>
> -Bill
>
> "John Toner [MVP]" <jtoner@xxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:%23Vs4G4MUFHA.1896@xxxxxxxxxxxxxxxxxxxxxxx
> > Bill,
> >
> > I would suggest reading the following KB article:
> >
> > Network failure detection and recovery in Windows Server 2003 Clusters
> > http://support.microsoft.com/kb/286342
> >
> > After the resource reaches its thresholds, it will remain in a failed
> state
> > until you retry the resource manually. You can adjust the "Threshold"
> > setting of the IP resource to a larger number if you want to have MSCS
> retry
> > a few more times, though this will also delay the amount of time it
takes
> to
> > failover the group for a network link failure on an individual host.
> >
> > Another alternative would be to write a script that monitors the health
of
> > your network links and automatically brings cluster resources online if
> > links are up.
> >
> > Regards,
> > John
> >
> > "bill" <belgie@xxxxxxxxxxx> wrote in message
> > news:eTWUeJMUFHA.1944@xxxxxxxxxxxxxxxxxxxxxxx
> > > The heartbeat cable is fine, because if I disconnect it I get a
message
> > > that the cable is disconnected. Also, I can ping and tracert the
> private
> > > nic IP on the other node.
> > >
> > > I have checked the settings you mentioned, and other settings
mentioned
> > in
> > > Q258750.
> > >
> > > If I disconnect both ethernet cables from the Public NICS on the nodes
> and
> > > reconnect one or both of them within about 10 seconds, the cluster
group
> > > comes back on line. If I wait any longer, it doesn't come back online
> > > automatically. Under the Groups folder, the Cluster Group status
> remains
> > > 'Failed', and under Resources, Cluster IP Address remains 'Failed'.
> > >
> > > After I reconnect the cables, I can manually bring the cluster group
on
> > > line.
> > >
> > > However, I have noticed that after I disconnect both cables for awhile
> and
> > > manually bring the cluster group online, the nodes no longer failover
if
> I
> > > disconnect the cable of the active node. The cluster group returns to
> > > Failed status and stays there. I have to reboot the cluster to
restore
> > the
> > > failover capability.
> > >
> > > I expect I have overlooked something important, but I can't find it.
If
> > you
> > > have any ideas, please let me know, because the cluster is useless in
> its
> > > present state. If the network fails briefly, or is shut down for
> > > maintenance in the middle of the night or the weekend, the cluster has
> to
> > be
> > > manually brought online and rebooted.
> > >
> > > Thanks!
> > > -Bill
> > >
> > > Bill,
> > >
> > > To verify that your heartbeat is working you could disconnect it from
> one
> > > node and see if the other node reports the loss of connection. You
> should
> > > verify the properties of the heartbeat to make sure it is set for
> internal
> > > communcations only in cluster admin. The "public" connection
properties
> > > should be set to both internal and public, this way if your heartbeat
> > > connection fails it will go accross the public.
> > > Remember for the heratbeat to work the cable needs to be a "crossover"
> > cable
> > > not a straight through "standard" network cable if they are directly
> > > connected server to server. If it goes through a hub then standard
> cables
> > > will work.
> > >
> > > Hope that helps!
> > > --
> > > Mark
> > >
> > >
> > > "bill" wrote:
> > >
> > > > Our Win2K3 cluster group stays off line after a network failure -
even
> > > after
> > > > the network service is restored.
> > > >
> > > > For example, if I disconnect the Public network cables from both
nodes
> > > > (simulating a network failure), and then reconnect the cables, I
have
> to
> > > > manually bring the Cluster group and Cluster IP address online.
They
> > > don't
> > > > automatically come back on line.
> > > >
> > > > If i just disconnect one cable, the cluster fails over without a
> > problem.
> > > >
> > > > The heartbeat seems OK - at least, the icon indicates the nodes are
> > > > communicating over the Private network ok.
> > > >
> > > > I would appreciate any suggestions.
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>


.



Relevant Pages

  • Re: Cluster group doesnt come back online after network failure
    ... >> The heartbeat cable is fine, because if I disconnect it I get a message ... >> reconnect one or both of them within about 10 seconds, the cluster group ... Under the Groups folder, the Cluster Group status ... >> After I reconnect the cables, I can manually bring the cluster group on ...
    (microsoft.public.windows.server.clustering)
  • Re: The RPC server is unavailable
    ... This should start Cluster Administrator and then you should ... be able to bring the "Cluster Group" online. ... > When I openning a connection to "FPCLUSTER",I get the follwoing message: ...
    (microsoft.public.windows.server.clustering)
  • RE: Cluster group offline
    ... I brought the cluster back online. ... following any advice ... >>How can I bring the cluster group online, ...
    (microsoft.public.windows.server.clustering)
  • Re: Cluster mystery: one-way MSCP disk serving?
    ... Disk ALP$DKA0:, device type FUJITSU MAA3182SC, is online, mounted, file-oriented ... device, shareable, available to cluster, error logging is enabled. ...
    (comp.os.vms)
  • Re: Move fails - some drives do not come online?
    ... So the Cluster Recover Tool should be able to help me with this? ... Online: Unable to open ClusDisk signature key 219631be. ... Windows Server 2008 Readiness Team ...
    (microsoft.public.windows.server.clustering)