Re: Lost Quorum

From: Mike Rosado [MSFT] (mikeros_at_online.microsoft.com)
Date: 11/03/04


Date: Wed, 3 Nov 2004 17:55:32 -0600

Hi Malle,

I'm by no means an expert in this subject matter of Veritas Volume Manager,
but I'll try to assist you to the best of my ability.

Here's your problem, the error 995 which caused the reservation on the
Quorum disk to be lost.

000007e8.0000091c::2004/11/01-06:34:01.542 Physical Disk <Disk Q:>:
[DiskArb] error checking disk reservation thread, error 995.
000007e8.0000091c::2004/11/01-06:34:01.557 Physical Disk <Disk Q:>:
[DiskArb] CompletionRoutine: reservation lost!
000007e8.0000091c::2004/11/01-06:34:01.557 [RM] RmpLostQuorumResource,
cluster service terminated...

Here's what the error 995 means:

# for decimal 995 / hex 0x3e3 :
  ERROR_OPERATION_ABORTED
# The I/O operation has been aborted because of either a
# thread exit or an application request.

The unfortunate part, is that we don't support Dynamic Disk on a Cluster.
Which looks like you are doing with Veritas Volume Manager and as stated in
the paragraph excerpt below, they should be your first point of contact when
encountering problems. Have you already contacted Veritas?

237853 Dynamic Disk Configuration Unavailable for Server Cluster Disk
Resources
http://support.microsoft.com/?id=237853

When you install the Veritas Volume Manager product on a cluster and
configure Volume Manager Disk Group resources, Veritas is the first point of
support for cluster issues related to those resources.

-- 
Hope this helps,
Mike Rosado
Windows 2000 MCSE + MCDBA
Microsoft Enterprise Platform Support
Windows NT/2000/2003 Cluster Technologies
====================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
====================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
<http://www.microsoft.com/info/cpyright.htm>
-----Original Message-----
"Malle" <Malle@discussions.microsoft.com> wrote in message
news:D1009367-B7AC-402C-929E-706C8D6D99DF@microsoft.com...
> We have a 2 node w2k cluster attached via fiberchannel to our san storage.
> We use Veritas Volume Manager 3.1 on both nodes.
> we have noticed a failover and our customer want to know the reason :-/
> we have found event ID 1000,1015,1038 in the systemlog and it seems to be
a
> problem with the quorum.
> we have checked the clusterlog we found the following errors:
> 00006f4.000007a8::2004/11/01-06:34:00.807 [DM]DmpCheckpointTimerCb- taking
a
> checkpoint
> 000006f4.000007a8::2004/11/01-06:34:00.807 [LM] LogReset entry...
> 000006f4.000007a8::2004/11/01-06:34:00.807 [LM] LogpReset entry...
> 000006f4.000007a8::2004/11/01-06:34:00.807 [LM] LogpCreate : Entry
> 000006f4.000007a8::2004/11/01-06:34:00.807 [LM] LogpMountLog : Entry
> pLog=0x000c5c08
> 000006f4.000007a8::2004/11/01-06:34:00.807 [LM] LogpMountLog::Quorumlog
File
> size=0x00000000
> 000006f4.000007a8::2004/11/01-06:34:00.807 [LM] LogpInitLog : Entry
> pLog=0x000c5c08
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogpAppendPage : Writing
> 1024 bytes to disk at offset 0x00000000
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogpInitLog :
> NextLsn=0x00000408 FileAlloc=0x00000800 ActivePageOffset=0x00000400
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogpCreate : Exit with
success
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogGetLastChkPoint:: Entry
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogGetLastChkPoint exit,
> returning 0x000013a8
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogReset:: no check point
> found in the old log file
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogCheckPoint entry
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogpReset:: Callback
failed
> to return a checkpoint, error=87
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogClose : Entry
> LogFile=0x000c5c08
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogFlush : pLog=0x000c5c08
> writing the 1024 bytes for active page at offset 0x00000400
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogClose : Exit returning
> success
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogpReset exit, returning
> 0x00000057
> 000006f4.000007a8::2004/11/01-06:34:00.823 [LM] LogReset exit, returning
> 0x00000057
> 000006f4.000007a8::2004/11/01-06:34:00.823 [DM]DmpCheckpointTimerCb -
Failed
> to reset log, error=87
> 000006f4.000007a8::2004/11/01-06:34:00.839 Microsoft Clustering Service
> suffered an unexpected fatal error
> at line 2166 of source module D:\nt\private\cluster\service\dm\dmlog.c.
The
> error code was 87.
> 000007e8.000007cc::2004/11/01-06:34:01.542 [RM] Going away, Status = 1,
> Shutdown = 0.
> 000007e8.000007cc::2004/11/01-06:34:01.542 [RM] RmpRundownResources,
> terminate resource <Logistik>...
> 000007e8.000007cc::2004/11/01-06:34:01.542 File Share <Logistik>:
> SmbShareDoTerminate: SmbpShareNotifyWorker Terminated... !!!
> 000007e8.0000091c::2004/11/01-06:34:01.542 Physical Disk <Disk Q:>:
> [DiskArb] CompletionRoutine, status 0.
> 000007e8.0000091c::2004/11/01-06:34:01.542 Physical Disk <Disk Q:>:
> [DiskArb] posting AsyncCheckReserve request.
> 000007e8.0000091c::2004/11/01-06:34:01.542 Physical Disk <Disk Q:>:
> [DiskArb] error checking disk reservation thread, error 995.
> 000007e8.0000091c::2004/11/01-06:34:01.557 Physical Disk <Disk Q:>:
> [DiskArb] CompletionRoutine: reservation lost!
> 000007e8.0000091c::2004/11/01-06:34:01.557 [RM] RmpLostQuorumResource,
> cluster service terminated...
> 000007e8.000007cc::2004/11/01-06:34:01.557 [RM] RmpRundownResources, close
> resource <Logistik>...
> 000007e8.000007cc::2004/11/01-06:34:01.557 [RM] RmpRundownResources,
> terminate resource <PROBASTEST>...
> 000007e8.000007cc::2004/11/01-06:34:01.557 File Share <PROBASTEST>:
> SmbShareDoTerminate: SmbpShareNotifyWorker Terminated... !!!
> 000007e8.000007cc::2004/11/01-06:34:01.557 [RM] RmpRundownResources, close
> resource <PROBASTEST>...
> 000007e8.000007cc::2004/11/01-06:34:01.557 [RM] RmpRundownResources,
> terminate resource <NetworkerRemoteExec>...
> 000007e8.000007cc::2004/11/01-06:34:01.557 Generic Service
> <NetworkerRemoteExec>: Terminate request.
> 000007e8.000007cc::2004/11/01-06:34:01.557 Generic Service
> <NetworkerRemoteExec>: GenSvcTerminate : calling SCM
> 000007e8.00000920::2004/11/01-06:34:01.573 Physical Disk: PnP Event
> GUID_IO_VOLUME_DISMOUNT for 619968 received
> 000007e8.000007cc::2004/11/01-06:35:01.728 Generic Service
> <NetworkerRemoteExec>: GenSvcTerminate: retrying...
> 000020a4.000018c8::2004/11/01-06:35:01.822
>
> 000020a4.000018c8::2004/11/01-06:35:01.822 [CS] Cluster Service started -
> Cluster Node Version 3.2195
> 000020a4.000018c8::2004/11/01-06:35:01.822
OS
> Version 5.0.2195 - Service Pack 4 (AS)
>
> 000020a4.0000207c::2004/11/01-06:35:01.822 [CS] Service Starting...
>
> Has anyone an idea what happens ?
>


Relevant Pages

  • Re: Changing devices in Veritas Volume Manager
    ... using PowerPath 4.5.0 and Veritas Volume Manager 4.0 in a Veritas ... The diskgroups used by the cluster configuration references the OS ... VXVM to use PowerPath devices instead of regular OS devices? ... Are you saying volume manager is using the cxtxdx instead of power ...
    (comp.unix.solaris)
  • Re: 2003 R2 FSRM error
    ... The only issue we've seen with these specific errors on a Cluster, ... Veritas Volume Manager 4.2 that is not support by Veritas running on a ... I received the error: File Server Resource ...
    (microsoft.public.windows.server.clustering)
  • RE: 2003 Ent Ed - Cluster Services - Dynamic Disks
    ... If you install Veritas Volume Manager and configure Volume ... contact Veritas Support for server cluster ... Dynamic Disk Configuration Unavailable for Server Cluster Disks ...
    (microsoft.public.windows.server.migration)
  • Re: Lost Quorum
    ... I will contact the veritas support. ... > I'm by no means an expert in this subject matter of Veritas Volume Manager, ... is that we don't support Dynamic Disk on a Cluster. ...
    (microsoft.public.windows.server.clustering)
  • Re: Advice needed on design of fault-tolerant multi-site solution
    ... Veritas stuff but I will defintiely look at that now. ... Geocluster looked promising but it seems to based on the 'stretched' cluster ... to a standby Exchange server if the original cluster has been destroyed. ...
    (microsoft.public.exchange.design)