Re: Cluster could not fail over
- From: "Ryan Sokolowski [MVP - Avanade]" <ryan@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 2 Aug 2005 09:54:54 -0700
No, I believe your testing is the culprit. When you remove the Public
cables, you shouldn't expect the node to failover its resources. The
Crossover cable seems to be working perfectly. Only when the Heartbeat
connection is broken, should you expect the failover to occur.
Make sense?
--
Ryan Sokolowski
MVP - Windows Server - Clustering
MCSE, CCNA, CCDA, BCFP
Avanade
http://www.Avanade.com
"A troubleshooter's best tool is the Event Viewer and understanding the
events and messages contained therein."
This posting is provided "AS IS" with no warranties, and confers no rights.
"Philip" <Philip@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:E8500B49-0F40-4377-A5D7-FA822BA77A5B@xxxxxxxxxxxxxxxx
> Hi Ryan,
>
> So in this case is the Heartbeat the cause of the problem? We did not
> change
> the 500ms round trip for the failover. Could it be that cross cable for
> the
> Heartbeat is not reliable?
>
> Thanks.
>
> Philip
>
> "Ryan Sokolowski [MVP - Avanade]" wrote:
>
>> Wouldn't this be normal as the failure of or lack of signal over the
>> heartbeat network is the initiator for a node resource failover? The
>> nodes
>> should failover resources when the heartbeat intervals are missed for a
>> particular period of time...
>>
>> "The network connections must be able to provide a maximum guaranteed
>> round
>> trip latency between nodes of no more than 500 milliseconds. The cluster
>> uses heartbeat to detect whether a node is alive or not responding. These
>> heartbeats are sent out on a periodic basis. If a node takes too long to
>> respond to heartbeat packets, the cluster service starts a heavy-weight
>> protocol to figure out which nodes are really still alive and which ones
>> are
>> dead; this is known as a cluster re-group. The heartbeat interval is not
>> a
>> configurable parameter for the cluster service (there are many reasons
>> for
>> this, but the bottom line is that changing this parameter can have a
>> significant impact on the stability of the cluster and the failover
>> time).
>> 500ms round-trip is significantly below any threshold to ensure that
>> artificial re-group operations are not triggered."
>>
>> --
>> Ryan Sokolowski
>> MVP - Windows Server - Clustering
>> MCSE, CCNA, CCDA, BCFP
>> Avanade
>> http://www.Avanade.com
>>
>> "A troubleshooter's best tool is the Event Viewer and understanding the
>> events and messages contained therein."
>>
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>>
>> "Philip" <Philip@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>> news:35BE4DB5-D199-47FC-B2DB-6CD0862FE8D3@xxxxxxxxxxxxxxxx
>> > Hi everyone,
>> > Recently we have setup 2 Windows 2003 Enterprise Servers with MS
>> > Clustering.
>> > The cluster was setup using the standard guide from Microsoft.
>> >
>> > The configurations are :
>> >
>> > 1) 3 NIC cards for each server: 2 on board NIC cards using Network
>> > Teaming
>> > for Network redundancy. 1 for Heartbeat using cross cable.
>> >
>> > 2) 2 HBA cards for each server for SAN redundancy.
>> >
>> > After the Cluster was setup, we tested failover test by removing 2
>> > Public
>> > network cables for each server, rebooting and shutting down of servers,
>> > The
>> > failover tests were completed without problem. The servers are able to
>> > take
>> > over resources whenever the other is down.
>> >
>> > However, when we tried the test again today by removing the 2 Public
>> > Network
>> > cables from one of the server which holds the resources, the other
>> > server
>> > is
>> > not able to take over the resources. Only by removing the Private or
>> > complete
>> > shutdown of the server will enable the failover to be completed
>> > successfully..
>> >
>> > Would appreciate any advise on the above perculiar behaviour..
>> >
>> > Thanks.
>> >
>>
>>
>>
.
- References:
- Cluster could not fail over
- From: Philip
- Re: Cluster could not fail over
- From: Ryan Sokolowski [MVP - Avanade]
- Re: Cluster could not fail over
- From: Philip
- Cluster could not fail over
- Prev by Date: Large cluster log file
- Next by Date: Re: Unable to failover when public network cables are removed
- Previous by thread: Re: Cluster could not fail over
- Next by thread: Re: Cluster could not fail over
- Index(es):
Relevant Pages
|
Loading