Re: Is Alive = Failing over



Hi,

Exactly as Geoff said, Is-Alove check is a lightweight test .From a SQL
Server perspective, the node hosting the SQL Server resource does a
looks-alive check every 5 seconds. This is a lightweight check to see
whether the service is running and may succeed even if the instance of SQL
Server is not operational. If this query fails, the IsAlive check retries
five times and then attempts to reconnect to the instance of SQL Server. If
all five retries fail, the SQL Server resource fails. The interval can be
changed through the cluster administrator by going into the Advanced tab of
the SQL server properties but by default LooksAlive interval is 5000
milliseconds and IsAlive interval is 60000 milliseconds.

You can check following article:

<http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/failclus.msp
x#XSLTsection125121120120>


Best regards,

Vincent Xu
Microsoft Online Partner Support

======================================================
Get Secure! - www.microsoft.com/security
======================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others
may learn and benefit from this issue.
======================================================
This posting is provided "AS IS" with no warranties,and confers no rights.
======================================================



--------------------
From: "Geoff N. Hiten" <SQLCraftsman@xxxxxxxxx>
References: <lFvrg.199$Js2.156@xxxxxxxxxxxxxxxx>
Subject: Re: Is Alive = Failing over
Date: Fri, 7 Jul 2006 12:44:53 -0400
Lines: 51
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.2869
X-RFC2646: Format=Flowed; Response
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
Message-ID: <O0HLtSeoGHA.4816@xxxxxxxxxxxxxxxxxxxx>
Newsgroups: microsoft.public.sqlserver.clustering
NNTP-Posting-Host: 69.15.110.58
Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGP01.phx.gbl!TK2MSFTNGP03.phx.gbl
Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.clustering:20214
X-Tomcat-NG: microsoft.public.sqlserver.clustering

There is actually an algorithm for the Looks-Alive and Is-Alive failure
sequence that the cluster service goes through before marking a service
as
"failed" and attempting a cluster recovery. It takes multiple failures
to
complete the sequence. The sequence was devised in attempt to balance
Type
I and Type II errors. That is avoiding a failure when one was not
necessary
(Type I) and while not missing any real failures (Type II). The checks
are
performed as a client would view the server, because that is the most
representative way of examining the system.

If your cluster is failing over due to non-responsiveness to the
Looks-Alive
and Is-Alove checks, then by definition it is failing correctly. You can
adjust the timing using the cluster tool, but I highly recommend not
doing
so. You will likely cause your system to not fail when you actually need
it
to. You would be better off finding out why the server cannot respond to
such a lightweight test as the cluster heartbeat tests.

If you insist on shooting yourself in the foot, here is exactly how you
do
it: Open the cluster administrator tool. Open the Resources folder.
Right-click on the SQL Server resource. Select the 'Advanced' tab. You
can
override the "Looks Alive" and "Is Alive" polling intervals here.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP






"Tim" <tim@xxxxxxxxxx> wrote in message
news:lFvrg.199$Js2.156@xxxxxxxxxxxxxxxxxxx
This is what is happening:
Cluster service is running "Is Alive" check every 1 minute on SQL server
I validate this by profiling the SQL server and see that "select
@@servername" command being executed by cluster service every minute
There are times when the server is stressed, thus connections I believe
are gettiing refused/delayed, some are these connections might be the
"Is
Alive" check.
Thus, the cluster service thinks there is something wrong with SQL and
either restarts or failovers SQL

Is there a threshold setting that can be set like, after 10 "Is Alive"
failed checks within 1 hour then failover or restart? Or what other
options do I have. We are in the process the trying to performance
tune
the server but this might takes weeks to complete. In the mean time
this
effects our production cluster.





.



Relevant Pages

  • Is it necessary to do a Service Pack reinstall after changing or adding new software or hardware com
    ... Clustered SQL Server do's, don'ts, and basic warnings ... Important cluster service administrative rules. ...
    (microsoft.public.sqlserver.clustering)
  • RE: Starting SQL Server Resource Group failed
    ... like your not using the same account for clustering services on both nodes. ... IP and server name. ... But I didn´t remove the local Administrators group from the SQL Server. ... is what the cluster service account uses to logon and ensure that SQL is up ...
    (microsoft.public.sqlserver.clustering)
  • Re: Antivirus
    ... ability of the cluster service to shift disk resources from node to node. ... I support the Professional Association for SQL Server ... > You'd want to exclude your data and log files. ... > Running SQL Server ...
    (microsoft.public.sqlserver.security)
  • Re: Active/Passive failover-Failed?
    ... Can you paste the SQL Server Errorlog to get more information? ... the Cluster Service is AUTOMATIC on ALL nodeswhereas the SQL Server and SQL Server Agent service accounts are MANUAL on ALL nodes. ... For information about the Strategic Technology Protection Program and to order your FREE Security Tool Kit, ... Microsoft highly recommends that users with Internet access update their Microsoft software to better protect against viruses and security vulnerabilities. ...
    (microsoft.public.sqlserver.clustering)
  • Re: Passive node starting the SQL Server services - services set to ma
    ... Manual start for services is correct for a cluster. ... Look at the applicaiotn logs on both ... The SQL Server services are set to manual on both nodes. ... IE (SQL Server resource group has NODE1/NODE2; ...
    (microsoft.public.sqlserver.clustering)

Loading