Re: Unable to failover when public network cables are removed



The resources will only failover IF the HEARTBEAT network is broken or
interrupted, not the public network. The heartbeat is the gauge for
determining node failure and subsequent resource failover.

See my earlier comments to the thread from Philip: "Cluster could not fail
over" - same issue (same person/company?) :)

HTH...

--
Ryan Sokolowski
MVP - Windows Server - Clustering
MCSE, CCNA, CCDA, BCFP
Avanade
http://www.Avanade.com

"A troubleshooter's best tool is the Event Viewer and understanding the
events and messages contained therein."

This posting is provided "AS IS" with no warranties, and confers no rights.

"Chuck Timon [MSFT]" <ctimon@xxxxxxxxxxxxx> wrote in message
news:etF5jE2lFHA.3144@xxxxxxxxxxxxxxxxxxxxxxx
> Can you provide a little more information here......Is the Public Network
> configured for 'All Communications"? Is the Private Network configured
> for 'Internal Communications Only'?
>
> --
> Chuck Timon, Jr.
> Microsoft Corporation
> CCE Beta Engineer
> This posting is provided "AS IS" with no
> warranties, and confers no rights.
> "Unable to failover" <Unabletofailover@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
> message news:75541467-680F-45D4-89BF-26D2D721F8AB@xxxxxxxxxxxxxxxx
>> Hi Gerald,
>>
>> Sorry for the confusion. There is only 1 standard cluster group. Inside
>> this
>> cluster group there is only Cluster IP address, Cluster Name and the
>> Quorum
>> Disk. There is no other resources.
>>
>> So the problem is when we pull out the 2 public network cables from
>> node1,
>> the Cluster IP and Name turned offline while the Quorum Disk remian
>> Online.
>> It did not failover to node2. This is a new setup, and it works
>> initially.
>>
>> Hope this clarify the problem description.
>>
>> Thanks and Regards,
>> Lester
>>
>> "Gerald Aigenbauer" wrote:
>>
>>> hi!
>>>
>>> you should not have any resources, needed by a user in the standard
>>> cluster
>>> group. create another group for the user resources.
>>>
>>> gerald aigenbauer.
>>>
>>> "Unable to failover" <Unabletofailover@xxxxxxxxxxxxxxxxxxxxxxxxx>
>>> schrieb im
>>> Newsbeitrag news:4AC12929-7A21-4C1D-B251-5E239DC2FBAF@xxxxxxxxxxxxxxxx
>>> > Hi Gerald, thank u for the reply. But we are able to get it refresh
>>> > sometimes
>>> > even when the resources fails. These resources has failed over to
>>> > another
>>> > node.
>>> >
>>> > And regarding the other issue where only the cluster ip and name went
>>> > offline but the quorum disk remains online, do you have any advise on
>>> > that?
>>> > When node1 have 2 of its public n/w cable pulled out, the cluster ip
>>> > and
>>> > name
>>> > goes offline, but the quorum disk remains online. As a result, node2
>>> > is
>>> > unable to take over the quorum and the cluster group is inaccessible
>>> > because
>>> > the resources are offline.
>>> >
>>> > Regards,
>>> > Lester
>>> >
>>> > "Gerald Aigenbauer" wrote:
>>> >
>>> >> hi mr. unable to failover!
>>> >>
>>> >> you cannot connect the cluster admin to the cluster name, if the
>>> >> resource
>>> >> of
>>> >> the clustername or it´s virtual ip-adress is offline. you can connect
>>> >> the
>>> >> cluster admin to name "." on a node or to the local clusternodename.
>>> >>
>>> >> gerald aigenbauer.
>>> >>
>>> >> "Unable to failover" <Unable to failover@xxxxxxxxxxxxxxxxxxxxxxxxx>
>>> >> schrieb
>>> >> im Newsbeitrag
>>> >> news:62419728-B3EC-42A3-A0AE-65167C9EDDCF@xxxxxxxxxxxxxxxx
>>> >> > Hi,
>>> >> >
>>> >> > We have setup 2 Windows 2003 Enterprise Servers with MS Clustering.
>>> >> > The cluster was setup using the standard guide from Microsoft.
>>> >> >
>>> >> > The configurations are as follow:
>>> >> >
>>> >> > 1) 3 NIC cards for each server: 2 on board NIC cards using Network
>>> >> > Teaming
>>> >> > for Network redundancy. 1 for Heartbeat using cross cable.
>>> >> >
>>> >> > 2) 2 HBA cards for each server for SAN redundancy.
>>> >> >
>>> >> > We are testing the failover by removing the server1 network
>>> >> > connection
>>> >> > on
>>> >> > the 2 public interface which are configured for teaming. It was
>>> >> > working
>>> >> > well
>>> >> > initially, that is, server1 is able to failover to server2. But
>>> >> > then
>>> >> > there
>>> >> > occurs a time when it refuse to failover. The Cluster IP address
>>> >> > and
>>> >> > Cluster
>>> >> > Name was Offline but the Quorum disk still showing Online. This has
>>> >> > resulted
>>> >> > the Cluster Group to be down as the 2 resources are offline. We
>>> >> > have to
>>> >> > manually Bring Online the resources. When we try to do the same
>>> >> > thing
>>> >> > again
>>> >> > by pulling out the 2 public network cables, it happened again, just
>>> >> > refuse
>>> >> > to
>>> >> > failover.
>>> >> >
>>> >> > We also noticed that when the 2 public network cables were pulled
>>> >> > out,
>>> >> > the
>>> >> > server1 Cluster Admin have problem refeshing. When I tried to
>>> >> > connect
>>> >> > to
>>> >> > the
>>> >> > Cluster Admin using server1's name instead of the Cluster Name,
>>> >> > then it
>>> >> > is
>>> >> > fine. Shouldn't it be able to refresh properly.
>>> >> >
>>> >> > Hope someone can help.
>>> >> >
>>> >> > Thanks alot.
>>> >> >
>>> >> > Regards,
>>> >> > Lester
>>> >> >
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>
>
>


.



Relevant Pages

  • Re: Cluster Testing - Failure and Recovery taking longer than expected
    ... The IP Address resources should fail, and the groups will all failover to another node in the cluster. ... When we place all groups on one node and unplug the public network cable the cluster does not immediately fail over to other nodes. ... It takes 7 minutes for the failure to register and for the cluster to recover. ...
    (microsoft.public.windows.server.clustering)
  • Re: file server clustering
    ... During the failover I get a message 'Delayed Write Failed'. ... You will have to declare all shares (cluster resources in this case, ... > I am building a clustered server that only provide drive access... ...
    (microsoft.public.windows.server.clustering)
  • Re: heartbeat network not sustaining cluster
    ... We have been seeing a problem where when the public network becomes unavailable, the cluster IP and name resources go offline and the cluster effectively stops. ... Messaging and Security, MCT, MCITP, MCTS and other stuff ...
    (microsoft.public.windows.server.clustering)
  • Re: A few questions on Failover
    ... What I have is 2 node failover cluster with SQL on it. ... one with SQL resources and one with Cluster's own resources. ...
    (microsoft.public.windows.server.clustering)
  • Re: Failed cluster node confusion!
    ... Blue exclamation marks usually means that the cluster service has terminated ... the second node did NOT failover the resources. ... of our nodes blue screened and the resources didn't failover. ...
    (microsoft.public.windows.server.clustering)