Re: Time of failover of Microsoft SQL 2000

From: Patrice (krakowpat_at_yahoo.com)
Date: 02/25/05


Date: 25 Feb 2005 07:08:57 -0800

Hello Joe,

After the 10-32 seconds for Mike, you come with < 10 seconds. We should
do something very wrong to end with ~ 2 min on a not loaded SQL
cluster.

But, I also think that we should at this stage be sure that we are
measuring the failover duration the same way ;-) Indeed, last WE, we
had to install some MS patches, and we have made two complete
failovers. Here are the details of the 2nd failover (the fastest one),
which took 2 min 9 sec:

 - NODE01 17:25:30 The Cluster Service is attempting to offline the
Resource Group "Cluster Group".
 - NODE01 17:25:30 The Cluster Service brought the Resource Group
"Cluster Group" offline.
 - NODE02 17:25:55 The Cluster Service is attempting to bring online
the Resource Group "Cluster Group".
 - NODE02 17:25:59 The Cluster Service brought the Resource Group
"Cluster Group" online.
 - NODE01 17:26:11 The Cluster Service is attempting to offline the
Resource Group "MSDTC".
 - NODE01 17:26:12 The Cluster Service brought the Resource Group
"MSDTC" offline.
 - NODE01 17:26:32 The Cluster Service is attempting to offline the
Resource Group "SQL01".
 - NODE02 17:26:34 The Cluster Service is attempting to bring online
the Resource Group "MSDTC".
 - NODE01 17:26:39 The Cluster Service brought the Resource Group
"SQL01" offline.
 - NODE02 17:26:47 The Cluster Service brought the Resource Group
"MSDTC" online.
 - NODE01 17:27:00 The Cluster Service is attempting to offline the
Resource Group "SQL02".
 - NODE02 17:27:00 The Cluster Service is attempting to bring online
the Resource Group "SQL01".
 - NODE01 17:27:08 The Cluster Service brought the Resource Group
"SQL02" offline.
 - NODE02 17:27:11 The Cluster Service brought the Resource Group
"SQL01" online.
 - NODE02 17:27:28 The Cluster Service is attempting to bring online
the Resource Group "SQL02".
 - NODE02 17:27:39 The Cluster Service brought the Resource Group
"SQL02" online.
[00:02:09]

Indeed, I must admit that the administrator has moved the four groups
one by one, which is probably not the most efficient way, any advices
on this topic is welcome! But, if we take the four moves independently,
we can see that we still have durations that are > 10 sec:

Move of the "Cluster Group" group:
 - NODE01 17:25:30 The Cluster Service is attempting to offline the
Resource Group "Cluster Group".
 - NODE01 17:25:30 The Cluster Service brought the Resource Group
"Cluster Group" offline.
 - NODE02 17:25:55 The Cluster Service is attempting to bring online
the Resource Group "Cluster Group".
 - NODE02 17:25:59 The Cluster Service brought the Resource Group
"Cluster Group" online.
[00:00:29]

Move of the "MSDTC" group:
 - NODE01 17:26:11 The Cluster Service is attempting to offline the
Resource Group "MSDTC".
 - NODE01 17:26:12 The Cluster Service brought the Resource Group
"MSDTC" offline.
 - NODE02 17:26:34 The Cluster Service is attempting to bring online
the Resource Group "MSDTC".
 - NODE02 17:26:47 The Cluster Service brought the Resource Group
"MSDTC" online.
[00:00:36]

Move of the "SQL01" group:
 - NODE01 17:26:32 The Cluster Service is attempting to offline the
Resource Group "SQL01".
 - NODE01 17:26:39 The Cluster Service brought the Resource Group
"SQL01" offline.
 - NODE02 17:27:00 The Cluster Service is attempting to bring online
the Resource Group "SQL01".
 - NODE02 17:27:11 The Cluster Service brought the Resource Group
"SQL01" online.
[00:00:39]

Move of the "SQL02" group:
 - NODE01 17:27:00 The Cluster Service is attempting to offline the
Resource Group "SQL02".
 - NODE01 17:27:08 The Cluster Service brought the Resource Group
"SQL02" offline.
 - NODE02 17:27:28 The Cluster Service is attempting to bring online
the Resource Group "SQL02".
 - NODE02 17:27:39 The Cluster Service brought the Resource Group
"SQL02" online.
[00:00:39]

By the way, you can here see why I was talking about 6 moves. You can
see here 4 moves, which should be completed by 2 last moves in order to
equally distribute the groups between the two servers. We did not
perform the 2 last moves because of our lack of confidence with the
NODE01 server, which has crashed 2 times since the beginning of the
year :-(

In summary, I am looking for:
(1) Advices on the most efficient way to move all the groups of a
cluster;
(2) Similar Event Log analysis

Finally, I need to emphasize that I have no clue about the SQL "load"
during the failover, I guess it would very interesting to have a graph
of duration of failover versus load :-))

Many thanks in advance and best regards,

Patrice



Relevant Pages

  • Starting SQL Server Resource Group failed
    ... The installation of the Virtual SQL Server 2005 went well, ... "All resources did not come online and therefore you will need to manually ... set the cluster restart option." ... If I want to bring the Resource Group online, the SQL Server Starts, but a ...
    (microsoft.public.sqlserver.clustering)
  • Re: Failover cluster question/problem
    ... justifiable reason to NOT put things into the default cluster group. ... Both apps ... cluster) and NOT the one local to the resource group. ... So, basically, i'm looking for a reason to invest alot of time and effort ...
    (microsoft.public.windows.server.clustering)
  • SUMMARY: Configuring an application in a SUN CLUSTER 3.0
    ... Then you should create the resource package; the command to do this ... Create the resource group and then the resorce itself in the cluster (to ... We want to configure the same XClock command to run in failover mode in Sun ...
    (SunManagers)
  • Configuring an application in a SUN CLUSTER 3.0 - SCALABLE Mode
    ... Subject: SUMMARY: Configuring an application in a SUN CLUSTER 3.0 ... Then you should create the resource package; the command to do this ... Create the resource group and then the resorce itself in the cluster (to ... We want to configure the same XClock command to run in failover mode in Sun ...
    (SunManagers)
  • Re: Recover cluster
    ... I was able to connect to cluster and fix my problem. ... The Cluster Service is attempting to bring online the Resource Group ... The Cluster Service failed to bring the Resource Group "Cluster Group" ...
    (microsoft.public.windows.server.clustering)