Re: Split brain scenario in cluster server system

From: Jason Nash [MSFT] (jasnas_at_online.microsoft.com)
Date: 07/14/04


Date: Wed, 14 Jul 2004 09:35:32 -0500

Whenever the cluster cannot determine who owns the quorum, challenge -
defense is initiated.

Eachnode in the cluster renews the disk reservation that it owns, including
the quorum disk, every three seconds.
    If the nodes of a cluster lose network communication with each other
(for
    example, if there is no communication over the private or public
network), the
    Cluster service (by using Clusdisk.sys) begins using the
Challenge/Defense
    protocol. The Challenge/Defense protocol is the cluster node
functionality that
    determines which nodes own the shared disks and which nodes are online
and
    functioning. The Challenge/Defense protocol uses the SCSI commands for
this
    functionality. The following procedure describes what occurs if the
nodes in a
    cluster lose network connectivity and there is no available network for
    heartbeat communications:

1. The node that currently owns the quorum disk is called the
   "defender." The defender assumes that it is the only surviving node and
   it continually renews the quorum by issuing a scsi reserve command
   every three seconds.

2. All other nodes (nodes that do not own the quorum disk) become the
   "challengers."

3. When the challenger detects the loss of heartbeat communications, it
   immediately issues a bus-wide scsi reset command.

4. Ten seconds after the scsi reset command is issued, the challenger
   tries to reserve the quorum disk. If the defender node is online and
   functioning, it will have already reserved the quorum disk as it
   typically does every three seconds. The challenger detects that it
   cannot reserve the quorum, and then shuts down its Cluster service. If
   the defender is not functioning properly, the challenger can
   successfully reserve the quorum disk. After ten seconds, the challenger
   brings the quorum online and takes ownership of all resources in the
   cluster.

For more information, see 309186 How the Cluster Service Takes Ownership of
a Disk on the Shared Bus http://support.microsoft.com/?id=309186

-- 
Jason Nash [MSFT]
**Please do not send e-mail directly to this alias. This alias is for 
newsgroup purposes only***
This posting is provided "AS IS" with no warranties, and confers no rights. 
http://www.microsoft.com/info/cpyright.htm
"Morten" <usenet@kikobu.com> wrote in message 
news:%23cZo5OaaEHA.3476@tk2msftngp13.phx.gbl...
>
> Hi.
>
> If a server cluster has 2 nodes w. active/passive SQL servers, what 
> happens if the network between the 2 nodes stops working?
>
> Both SQL servers go active? Or does the shared storage contain data to 
> prevent this? (assuming the connection to the shared storage is intact).
>
> Br,
>
> Morten 


Relevant Pages

  • RE: Move quorum to new disk
    ... The procedure for actually replacing the quorum disk is identical to ... required to startup the cluster in the event of a quorum disk failure. ... replacement procedure outlined in Recovering a shared disk to replace the ...
    (microsoft.public.windows.server.clustering)
  • Re: active active windows 2003 clustering
    ... >>create the cluster, ... During Windows 2003 MSCS analysis the Quorum ... Disk selection process follows Best Practices ... >>cluster install on windpws 2003 server enterprise. ...
    (microsoft.public.windows.server.clustering)
  • Re: Changing LUNS
    ... To test I tried adding this node to another test cluster and it added fine, ... but build it to a minium just add the Quorum disk and no other disks. ... Posting state 2 notification for resource ...
    (microsoft.public.windows.server.clustering)
  • Re: Moving a Quorum from the c drive to a San Dsik array
    ... Once you've presented the disk to both nodes, formatted it, assigned it a drive letter, then you can create it as a new 'Physical Disk' resource in the same group as your MNS resource. ... you can move the quorum designation as Rodney indicated and delete the old MNS resource. ... Microsoft Enterprise Platforms Support (Server Core/Cluster) ... To move the quorum use Cluster Administrator, right click on the Cluster name, go to the Quorum tab, use the disk drop down to select a new disk resource. ...
    (microsoft.public.windows.server.clustering)
  • SUMMARY: changed WWID on cluster member boot disk
    ... disk and quorum disk of a single-member cluster, ... I could no longer boot from the cluster disks, ... the pre-cluster stand-alone system disk; ... the root1_domain on LUN containing the member boot disk was found ...
    (Tru64-UNIX-Managers)