Re: Cluster says that service is up but it isn't
- From: "John Toner [MVP]" <jtoner@xxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 16 Oct 2008 17:36:17 -0400
Difficult to guess what is happening with only snippets of the logs. At
8:27, cluster failed the resource and then restarted it. At 10:36, you
didn't quote an ERR message...did someone manually bring the resource
offline?
There are also possibilities of thresholds being hit. Is this service
failing frequently? Eventually, MSCS will stop attempting to bring the
resource back online if the resource threshold limits are met.
Also, is the service depending on any other resources? If a resource that
the service depends on should fail, cluster would not necessarily bring the
resource back online.
Regards,
John
Visit my blog: http://msmvps.com/blogs/jtoner
"??? ?." <@discussions.microsoft.com> wrote in message
news:FA570F8E-56EF-4817-A1D0-CE3DBB444FF8@xxxxxxxxxxxxxxxx
Hello.and
Our application is running in windows 2003 cluster environment.
One of our resources is a Service.
Most of the time when the service is down the cluster is awaer about it
restart it automatically.fact
but sometimes it just not bring it up.
I have explored the log files of the cluster and figured out that the
cluster did recognized that the service was died and restart it. but in
the service was not restart at all only after 3 hours when a manualrestart
occured.stop
the cluster show in log file that the service is up and running but it is
just not.
I am attaching my evaluation. the service name is "Opnotes".
please see if you can help me with that weird phenomenon.
Also i can send the full log file in mail if someone need.
thanks.
*****************************************
the evaluation:
The log file of the cluster is running in GMT +0 time.
meaning that the offset from the Opnotes log files is +7 hours.
When looking into the Cluster log of 293 we can clearly see the Service
and restart around 10/14/2008 8:27 and 10:36GMT)
According to the log the service was terminated at 08:27 (+7 hours to GMT)
00000adc.00000ae8::2008/10/14-15:27:15.317 ERR Generic Service <E&C
OpNotes>: Failed the IsAlive test. Current State is 1.
00000adc.00000a50::2008/10/14-15:27:15.379 INFO Generic Service <E&C
OpNotes>: Terminate request.
00000adc.00000a50::2008/10/14-15:27:15.379 INFO Generic Service <E&C
OpNotes>: GenSvcTerminate : calling SCM (didStop=0)
00000adc.00000a50::2008/10/14-15:27:15.379 INFO Generic Service <E&C
OpNotes>: Service died; status = 1062.
then it was immediatelly set to running, but the opnotes service was not
running:
00000adc.00001360::2008/10/14-15:27:15.567 INFO Generic Service <E&C
OpNotes>: Service is now running.
However on 10:36 it is consider again a stopped:
00000adc.000006ec::2008/10/14-17:36:35.837 INFO Generic Service <E&C
OpNotes>: Service died or not active any more; status = 1062.
00000adc.000006ec::2008/10/14-17:36:35.837 INFO Generic Service <E&C
OpNotes>: Service is now offline.
and then after several actions it is running again at 10:36(+7 hours to
and opnotes service was running:as
00000adc.00001420::2008/10/14-17:36:46.368 INFO Generic Service <E&C
OpNotes>: Service is now running.
Open questions:
Why the opnotes was not restarted on 08:27 while the cluster consider it
running?an
What is the difference between the 08:27 and 10:36 restart? was the 08:27
automatically restart by the cluster while on 10:27 the user it self didthe
restart from the cluster admin?what
Why only on 10:36 the cluster is considering again the service as died?
happen since 08:27?
Eli.
.
- Prev by Date: Re: NIC configuration for NLB and fail over cluster.
- Next by Date: Re: Windows 2008 A/A hyperV clustering
- Previous by thread: Windows 2008 A/A hyperV clustering
- Next by thread: Folder share with failover
- Index(es):
Relevant Pages
|