Re: Cluster says that service is up but it isn't

Tech-Archive recommends: Fix windows errors by optimizing your registry



Difficult to guess what is happening with only snippets of the logs. At
8:27, cluster failed the resource and then restarted it. At 10:36, you
didn't quote an ERR message...did someone manually bring the resource
offline?

There are also possibilities of thresholds being hit. Is this service
failing frequently? Eventually, MSCS will stop attempting to bring the
resource back online if the resource threshold limits are met.

Also, is the service depending on any other resources? If a resource that
the service depends on should fail, cluster would not necessarily bring the
resource back online.

Regards,
John

Visit my blog: http://msmvps.com/blogs/jtoner

"??? ?." <@discussions.microsoft.com> wrote in message
news:FA570F8E-56EF-4817-A1D0-CE3DBB444FF8@xxxxxxxxxxxxxxxx
Hello.
Our application is running in windows 2003 cluster environment.
One of our resources is a Service.
Most of the time when the service is down the cluster is awaer about it
and
restart it automatically.
but sometimes it just not bring it up.
I have explored the log files of the cluster and figured out that the
cluster did recognized that the service was died and restart it. but in
fact
the service was not restart at all only after 3 hours when a manual
restart
occured.
the cluster show in log file that the service is up and running but it is
just not.
I am attaching my evaluation. the service name is "Opnotes".
please see if you can help me with that weird phenomenon.
Also i can send the full log file in mail if someone need.
thanks.
*****************************************
the evaluation:
The log file of the cluster is running in GMT +0 time.
meaning that the offset from the Opnotes log files is +7 hours.

When looking into the Cluster log of 293 we can clearly see the Service
stop
and restart around 10/14/2008 8:27 and 10:36
According to the log the service was terminated at 08:27 (+7 hours to GMT)
00000adc.00000ae8::2008/10/14-15:27:15.317 ERR Generic Service <E&C
OpNotes>: Failed the IsAlive test. Current State is 1.
00000adc.00000a50::2008/10/14-15:27:15.379 INFO Generic Service <E&C
OpNotes>: Terminate request.
00000adc.00000a50::2008/10/14-15:27:15.379 INFO Generic Service <E&C
OpNotes>: GenSvcTerminate : calling SCM (didStop=0)
00000adc.00000a50::2008/10/14-15:27:15.379 INFO Generic Service <E&C
OpNotes>: Service died; status = 1062.

then it was immediatelly set to running, but the opnotes service was not
running:
00000adc.00001360::2008/10/14-15:27:15.567 INFO Generic Service <E&C
OpNotes>: Service is now running.

However on 10:36 it is consider again a stopped:
00000adc.000006ec::2008/10/14-17:36:35.837 INFO Generic Service <E&C
OpNotes>: Service died or not active any more; status = 1062.
00000adc.000006ec::2008/10/14-17:36:35.837 INFO Generic Service <E&C
OpNotes>: Service is now offline.

and then after several actions it is running again at 10:36(+7 hours to
GMT)
and opnotes service was running:
00000adc.00001420::2008/10/14-17:36:46.368 INFO Generic Service <E&C
OpNotes>: Service is now running.

Open questions:
Why the opnotes was not restarted on 08:27 while the cluster consider it
as
running?
What is the difference between the 08:27 and 10:36 restart? was the 08:27
an
automatically restart by the cluster while on 10:27 the user it self did
the
restart from the cluster admin?
Why only on 10:36 the cluster is considering again the service as died?
what
happen since 08:27?

Eli.



.



Relevant Pages

  • RE: SQL Server Cluster and MSDTC
    ... Microsoft only supports running MSDTC on cluster nodes as a clustered ... Using MSDTC as a non-clustered resource on a Windows cluster is ... DD - SQL server 2005 ...
    (microsoft.public.sqlserver.clustering)
  • Re: Fulltext failure on a 2 node cluster
    ... No cluster resource for that cluster has shown as failed. ... There are NO error messages in SQL Server or the event log. ... Server full-text search resource online: ...
    (microsoft.public.sqlserver.clustering)
  • Re: How would you configure an N+1 cluster using MS Cluster Servic
    ... In case anyone else stumbles across this problem with the Generic Service ... It is possible to use both the Cluster Administrator and the APIs to ... The only way round this restriction seems to be to write a resource type DLL ... hence prevent multiple failovers to a passive node in an N+1 configuration. ...
    (microsoft.public.windows.server.clustering)
  • Re: Unable to start the cluster service when creating a new 2 node
    ... existing cluster and could no form a new cluster. ... node 1 as possible host for resource 6add91be-f8ae-4fb7-b78d-2488c502ad0a. ... FmpAddPossibleNodeToList for restype Physical Disk ... [DiskArb] ...
    (microsoft.public.windows.server.clustering)
  • Re: Print spooler cluster...
    ... I use the same accoutn for my data warehouse cluster...my Exchange cluster ... dispatching seq 50456 type 0 context 11 ... Posting state 4 notification for resource <PRINT SERVER SPOOLER> ...
    (microsoft.public.windows.server.clustering)