Re: Correcting corrupt $MFT on shared clustered disk.
- From: "Mike O" <put_the_spam@xxxxxxx>
- Date: Thu, 27 Sep 2007 21:08:53 -0400
comments mixed below:
"John Toner [MVP]" <jtoner@xxxxxxxxxxxxxxxxxxxxx> wrote in message news:ueYAS1SAIHA.4444@xxxxxxxxxxxxxxxxxxxxxxx
You need to set the cluster disk in Maintenance mode in order to run chkdsk
against a cluster drive, not extended (you can't run chkdsk in extmaint
mode). Extended maintenance mode is mostly used for VSS type operations
against a cluster disk.
When you issue the chkdsk against the drive, you will need to close all
handles against the disk. Best idea would be to take all cluster resources
offline and then bring only the disk resource online to issue the chkdsk.
Thank you for the information. I'd found the maintenance mode KB article. After digging into it I figured out I didn't need the extended maintenance mode and figured out the rest of the article.
The cluster group has two disks, each with shares on it. If I take the file share resources for only this disk off line, then take the disk resource off line and back, there should be no effect on the other disk & files shares, even though they're in the same cluster group, correct?
FYI, this type of corruption can occur if you had open handles on the drive
at the time of the cluster failover. See
http://support.microsoft.com/kb/321939 for more details.
It sounds like what happened. We didn't have non-cluster shares on the drive, but since the active node was in the process of crashing, when the inactive node took the resources, one or more of the shares probably didn't get cleanly closed before the disk switched.
Also, I have a sneaking suspicion that if you engage EMC, their support reps
will tell you that there is nothing wrong with the physical disk or CX array
and that they do not support file system issues. You'll likely need to
engage Microsoft to support any issues with $MFT.
I opened the ticket with EMC earlier today. They had me run the SP and grab reports. They told me the physical disk structure was OK, although the did identify some other issues that had nothing to do with this, though. The tech said he was going to pass the ticket on to their Windows group and that they would call me (they didn't yet..)
I've seen chkdsk's take days to complete. I don't think there's any way to
really calculate how long it will take other than testing it yourself. In
general, the more files/directories a volume has, the longer it will take to
chkdsk.
Currently we have a maintenance "window" from 6:00pm Saturday until about 6:00am Monday. Hopefully 36 hours will be enough to correct the issue. I've run chkdsk in read-only mode, it identified about 80 things it needed to fix then said it couldn't contine in "non-fix" mode. Hopefully it won't take more than 36 hours...
This is why many folks would recommend splitting your disks up into
smaller LUNs and use mount points instead of using one huge disk. Also, the
restore time on a 2TB+ LUN can be pretty significant.
I thought about using the mount points, but the problem we have is space allocation. As I understand it, a junction point looks like a subfolder, but is a connection to another "drive" (or partition on another drive). This server has files for about 12 departments. Prior to thsi cluster most of them had their own servers. If I assign drives for different areas, I figure we'll eventually end up with the same problem we had before: some areas running out of space and others with space to spare. I'm trying not to have to continually manage and move user files..
Hope this helps.
Regards,
John (EMC support rep)
Visit my blog: http://msmvps.com/blogs/jtoner
Thank you again.
Mike O.
"Mike O" <put_the_spam@xxxxxxx> wrote in message
news:ObkBx6KAIHA.4568@xxxxxxxxxxxxxxxxxxxxxxx
I had seen the "hangrecoveryaction" issue and set it after this happenedoneother time (the disk didn't get corrupted though). I'm not sure why itbut
didn't kick in this time.
As for the other part, I saw the "extended maintenance mode" KB article,I wasn't quite sure how it would work with running chkdsk. I was hopingbe
maybe someone has done this already and could give me some pointers.
I hadn't thought about using EMC for this, I guess I figured since it was
more of a Windows issue, not hardware or SAN related and that it wouldn't
in their area.
.
- Follow-Ups:
- Re: Correcting corrupt $MFT on shared clustered disk.
- From: John Toner [MVP]
- Re: Correcting corrupt $MFT on shared clustered disk.
- References:
- Re: Correcting corrupt $MFT on shared clustered disk.
- From: Rodney R. Fournier [MVP]
- Re: Correcting corrupt $MFT on shared clustered disk.
- From: Mike O
- Re: Correcting corrupt $MFT on shared clustered disk.
- From: John Toner [MVP]
- Re: Correcting corrupt $MFT on shared clustered disk.
- Prev by Date: Re: Delete a printer
- Next by Date: Re: The error code is 1240 <sigh>
- Previous by thread: Re: Correcting corrupt $MFT on shared clustered disk.
- Next by thread: Re: Correcting corrupt $MFT on shared clustered disk.
- Index(es):