Re: Correcting corrupt $MFT on shared clustered disk.

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



please do NOT multipost

--
Cordialement,
Mathieu CHATEAU
http://lordoftheping.blogspot.com


"Mike O." <MikeO@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:F3CB8885-569A-493A-BAE1-597ED71C3D4A@xxxxxxxxxxxxxxxx
I'm trying to find out some information about using CHKDSK on a clustered
drive.
We have a two node cluster (active/passive) running Windows 2003 R2
enterprise 32 bit with SP1. The cluster has three shared drives located on
an EMC CX700 SAN. The three drives are a 500MB for the quorum, and two data
drives 1.5TB (drive "E") and 2.4TB (drive "W"). Drive E is a basic disk, the
2.4TB drive W is a GPT disk. They're both about 70% full The E drive has
been active for about a year, the W: one was added around June.

Yesterday the active node became sluggish and then stopped serving data. It
still responded to low level stuff like PING, users were getting errors on
the server. Logging in gave a blank screen. This has happened a couple of
times before (that's a separate issue we're looking into).

We went to the inactive node and did a "move group" in the cluster
administrator. We've done this before for various reasons with no problems,
it usually takes about 20 seconds to bring the resources up on the other node.

This time when the resources came on line on the 2nd node, we started
getting an application popup that "Windows - Corrupt File : The file or
directory E:\$Mft is corrupt and unreadable. Please run the Chkdsk utility."
The drive seems to be running OK with users accessing the information
normally. I did some research and it appears that Windows will use the
duplicate copy of the MFT if the primary one is corrupted.

I know we need to run CHKDSK soon, but unfortunately, running chkdsk and
taking the drive off line for several hours is not something we can do
during daytime hours. If necessary we could run it overnight, but with that
size of drive I don't know if it would finish by the next morning.

The server has dual fiber connections (we're using the EMC Powerpath
software for SAN failover), and we didn't have anything happen with the SAN
at that time, so based on the timing I'm assuming the MFT corruption was
related to the cluster failover, not a physical hardware issue, so I wasn't
planning on running the sector scan. I would imagine a sector scan on a
1.5TB "disk" would run for a while…
At this point I'm planning on running CHKDSK over the weekend. I've never
run it on a clustered disk before and I'm looking for some information about
it. I've read Microsoft KB176970 and KB903650, but frankly they're a little
confusing with the issues about "maintenance mode".

Also, is my understanding about the mirrored/secondary MFT valid? Since
users appear to be getting information correctly can the CHKDSK wait until
the weekend?. Our backup policy does a full backup each week and an
incremental daily, so if something really bad happens we should be able to
recover.

Any information on this would be appreciated.

Mike O.


.



Relevant Pages

  • Re: Scheduling CHKDSK and DEFRAG for unattended execution
    ... If you are working with Windows XP Home Edition, ... can create an unlimited number of logical drives per disk. ... Description of Enhanced Chkdsk, Autochk, and Chkntfs Tools ...
    (microsoft.public.windowsxp.perform_maintain)
  • Re: Correcting a corrupted $MFT on a shared clustered disk
    ... Also, the problem I'm having is on the smaller basic disk, the GPT one is ... When I do run chkdsk, are there any special issues with the cluster? ... The three drives are a 500MB for the quorum, ...
    (microsoft.public.windows.server.general)
  • Re: Correcting a corrupted $MFT on a shared clustered disk
    ... Unfortunately I didn't redirect the output, and it doesn't look chkdsk logs the errors, and that's not long enough to be useful. ... Also, the problem I'm having is on the smaller basic disk, the GPT one is ... are there any special issues with the cluster? ... The three drives are a 500MB for the quorum,> and two ...
    (microsoft.public.windows.server.general)
  • Re: Correcting a corrupted $MFT on a shared clustered disk
    ... I did see an issue related to the "security id" and chkdsk if you have over 4 million, but the hotfix is a few years old and the version of the system dll's on the server are later than the ones in the hotfix, so it appears that the fix is already there. ... are there any special issues with the cluster? ... The three drives are a 500MB for the> quorum, ...
    (microsoft.public.windows.server.general)
  • Re: Correcting a corrupted $MFT on a shared clustered disk
    ... I did see an issue related to the "security id" and chkdsk if you have over 4 million, but the hotfix is a few years old and the version of the system dll's on the server are later than the ones in the hotfix, so it appears that the fix is already there. ... are there any special issues with the cluster? ... The three drives are a 500MB for the quorum,> and two ...
    (microsoft.public.windows.server.general)