Disk failure and event id 1146



Hello all, this is my environment:

2 nodes MSCS with Windows Server 2003 SP1.
Node 1 host SQL Server, Cluster group, File system and Lotus Domino
Node 2 host Oracle 9i in a failsafe confoguration.
IBM SAN DS4400 with firmware code and device drivers for all devices at n-1.

This is not always happening but I would like to share this strange scenario
with you:

When I decide to "take one group offline", in this case the Oracle group, I
end up in this situation:

the 2 disks of the Oracle group (T:\ and U:\) gets failed.

In the system event log I receive the following errors:
"The cluster resource monitor died unexpectedly, an attempt will be made to
restart it."

This error is followed by a series of errors like this one:

"The system failed to flush data to the transaction log. Corruption may
occur."

In the cluster log I have:
"00000918.000016f0::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
Offline, ResourceEntry @ 000C4A90 Valid 1
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [FM] FmpRmOfflineResource:
RmOffline() for 85f5dcef-4d92-4734-b330-01c0c9ba46ac returned error 997
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate: queuing
update type 0 context 8
00000918.000013c8::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
[PnP] Stop watching PnP events for disk a184bf5a
00000918.000013c8::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
[PnP] Stop watching disk a184bf5a - processed
00000918.000016f0::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
Offline, Processing disk number 7.
00000918.000016f0::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
Offline, Using handle array.
00000918.000016f0::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
Offline, Opening device \Device\Harddisk7\Partition1.
00000918.000016f0::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
Offline, FlushFileBuffers for \Device\Harddisk7\Partition1.
00000918.000016f0::2007/01/05-17:09:30.516 INFO Physical Disk <Disk U:>:
Offline, Locking volume for \Device\Harddisk7\Partition1.
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate:
Dispatching seq 160634 type 0 context 8 to node 2
00000918.000013c8::2007/01/05-17:09:30.516 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK for U (Partition1) - Received
00000918.000013c8::2007/01/05-17:09:30.516 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK for U (Partition1) - Processed
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate:
completed update seq 160634 type 0 context 8
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [FM]
FmpPropagateResourceState: resource 85f5dcef-4d92-4734-b330-01c0c9ba46ac
pending event.
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [FM]
FmpSetResourcePersistentState: Setting persistent state for resource
85f5dcef-4d92-4734-b330-01c0c9ba46ac...
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate: queuing
update type 1 context 4098
00000918.000013c8::2007/01/05-17:09:30.516 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK_FAILED for U (Partition1) - Received
00000918.000013c8::2007/01/05-17:09:30.516 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK_FAILED for U (Partition1) - Processed
00000918.000016f0::2007/01/05-17:09:30.516 WARN Physical Disk <Disk U:>:
Offline, Locking volume failed, error 5.
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate:
Dispatching seq 160635 type 1 context 4098 to node 2
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [DM] DmWriteToQuorumLog
Entry Seq#=160635 Type=4098 Size=152
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [DM] DmUpdateSetValue
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [DM] Setting value of
PersistentState for key Resources\85f5dcef-4d92-4734-b330-01c0c9ba46ac to
0x00000000
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [DM] DmWriteToQuorumLog
Entry Seq#=160635 Type=4098 Size=152
000006c4.00000a34::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate:
completed update seq 160635 type 1 context 4098
000006c4.0000084c::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate: queuing
update type 0 context 11
000006c4.0000084c::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate:
Dispatching seq 160636 type 0 context 11 to node 2
000006c4.0000084c::2007/01/05-17:09:30.516 INFO [GUM] GumSendUpdate:
completed update seq 160636 type 0 context 11
000006c4.0000084c::2007/01/05-17:09:30.516 INFO [FM]
FmpRmChangeResourceParams for resource <85f5dcef-4d92-4734-b330-01c0c9ba46ac>
called...
00000918.000013c8::2007/01/05-17:09:31.125 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK for U (Partition1) - Received
00000918.000013c8::2007/01/05-17:09:31.125 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK for U (Partition1) - Processed
00000918.000013c8::2007/01/05-17:09:31.125 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK_FAILED for U (Partition1) - Received
00000918.000013c8::2007/01/05-17:09:31.125 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_LOCK_FAILED for U (Partition1) - Processed
00000918.000016f0::2007/01/05-17:09:31.125 WARN Physical Disk <Disk U:>:
Offline, Locking volume failed, error 5.
00000918.000016f0::2007/01/05-17:09:31.719 INFO Physical Disk <Disk U:>:
Offline, Dismounting volume \Device\Harddisk7\Partition1.
00000918.000013c8::2007/01/05-17:09:31.719 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_DISMOUNT for U (Partition1) - Received
00000918.000013c8::2007/01/05-17:09:31.719 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_DISMOUNT for U (Partition1) - Processed
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
Offline, Dismount complete, volume \Device\Harddisk7\Partition1.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
DiskCleanup started.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[DiskArb] StopPersistentReservations is called.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[DiskArb] Stopping reservation thread.
00000918.00000cc4::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[DiskArb] CompletionRoutine, status 0.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[ArbCleanup] Verifying sector size.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[ArbCleanup] Reading arbitration block.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[DiskArb] Successful read (sector 12) [SRV0002B:120604]
(0,5f5f2dea:01c72da1).
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[ArbCleanup] Writing arbitration block.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[DiskArb] Successful write (sector 12) [:0] (0,00000000:00000000).
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[ArbCleanup] Returning status 0.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
[DiskArb] StopPersistentReservations is complete.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
DisksDismountDrives: letter mask is 00100000.
00000918.000016f0::2007/01/05-17:09:31.735 INFO Physical Disk <Disk U:>:
DiskCleanup returning final error 0
00000918.000013c8::2007/01/05-17:09:31.735 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_UNLOCK for U (Partition1) - Received
00000918.000013c8::2007/01/05-17:09:31.735 INFO Physical Disk: [PnP] Event
GUID_IO_VOLUME_UNLOCK for U (Partition1) - Processed
00000918.000016f0::2007/01/05-17:09:31.750 INFO Physical Disk <Disk U:>:
Offline, Returning final error 0.
00000918.000016f0::2007/01/05-17:09:31.750 INFO [RM] RmpSetResourceStatus,
Posting state 3 notification for resource <Disk U:>
000006c4.00001798::2007/01/05-17:09:31.750 INFO [FM] NotifyCallBackRoutine:
enqueuing event
000006c4.00001798::2007/01/05-17:09:31.750 INFO [FM] Calling RmNotifyChanges
in monitor 0918.
000006c4.00000848::2007/01/05-17:09:31.750 INFO [CP] CppResourceNotify for
resource Disk U:"

I do not have more than 100 disks, (http://support.microsoft.com/kb/314753),
so this is not my situation.

I do not find in the cluster log the following note:
"WARN Network Name Enterprise Network Name: Failed to obtain
IDirectoryObject pointer to computer account ServerName, status 80005000"
so this is not also my case (http://support.microsoft.com/kb/886652)

Any clue?
Thank you very much and kind regards.





Franco
.



Relevant Pages

  • Re: Strange failover problem on new W2K3 Cluster
    ... This is a somehow thrown together cluster and I ... Taking resource offline... ... Offline of resource continuing... ...
    (microsoft.public.windows.server.clustering)
  • Re: cluster scripting suggestions?
    ... cluster to house all of its data. ... are offline when data replication begins. ... Begin data replication. ... I've also determined how to take the desired resources offline via the ...
    (microsoft.public.windows.server.clustering)
  • Re: Is a failover logged in Cluster.log?
    ... and it logs most of the activities using the GUID's of the cluster objects ... depends on SQL Server (SQL1). ... Offline of resource continuing... ...
    (microsoft.public.sqlserver.clustering)
  • Re: Is a failover logged in Cluster.log?
    ... The main reason is that it logs far more than what you would care, ... and it logs most of the activities using the GUID's of the cluster objects ... depends on SQL Server (SQL1). ... Offline of resource continuing... ...
    (microsoft.public.sqlserver.clustering)
  • Re: Disk Ressource are pointin to different disks
    ... during the time when they wanted to bring this group online the ... disk was presented to the cluster and not offline as it was most of the ...
    (microsoft.public.windows.server.clustering)

Loading