Re: Bad RAID Configuration Need Rebuild 1st DC
From: Dan Foxley (dfoxley_at_nospampacificdatavision.com)
Date: 03/18/05
- Next message: Fredrick: "Re: I am trying to set up a server so others can get drivers, etc."
- Previous message: Robert Craig: "Re: I am trying to set up a server so others can get drivers, etc."
- In reply to: Tim: "Re: Bad RAID Configuration Need Rebuild 1st DC"
- Next in thread: Rebecca Chen [MSFT]: "Re: Bad RAID Configuration Need Rebuild 1st DC"
- Reply: Rebecca Chen [MSFT]: "Re: Bad RAID Configuration Need Rebuild 1st DC"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 17 Mar 2005 20:43:45 -0800
Tim..I hear ya. I'm thinking this is a strange story I'm getting....Ok...so.
This is an embedded SATA RAID controller on an IBM x206 Series server.
There are 2 ways to configure/view the status of the RAID controller. Ctrl
+ A during boot up and an IBM branded version of the Adaptec Storage Manager
Browser Version called IBM ServeRAID Manager. The ServeManager seems to
install as needed depending on the detected controller. I have a RAID1
configured to 2 x 160 Maxtor SATA discs (actually an IBM part as they come
in a tray that is hard to find and not sold separately). Port 0 & Port 1.
Port 1 reported "defunct". IBM shipped out another drive. I had a colleague
swap out Port 1 and select rebuild. As I find out now, it consistently
fails at 41%. Below is the chain of events from the log file: (attachment as
well)
03/16/2005 02:43:21 PM EST pdv-nj Rebuilding: controller 1, logical drive 1
("OS_Data").
03/16/2005 03:05:48 PM EST pdv-nj
SetInformationEvent,ScsiStatGenerated,DeviceEvent,MediumError
03/16/2005 03:05:48 PM EST pdv-nj
SetInformationEvent,ScsiStatGenerated,MgtUpdate,RecreateFailed
03/16/2005 03:05:48 PM EST pdv-nj Defunct drive: controller 1, port 1
Had an IBM contractor come out and spend the day on this issue, the server
is in a branch office. During the CTRL + A both drives were "Verified" and
Drive 0 (the non-defunct drive) doesn't show an error but , as he says it,
"pauses" at about 41% during the verify (done 2x's). They are calling this
a bad STRIPE in the RAID1 and that only wiping the RAID1 and rebuilding it
will repair this, hence a backup and restore of the DC. As well they say an
imaging of the HD with Ghost or similar will put the bad "stripe" back on
the drive and will result in it not being able to rebuild the mirror.
There is nothing nothing in the log file that is meaningful, it seems. I
think since this is a branded embedded controller IBM can't give me the
straight scoop, I'm guessing Adaptec (80$ incident fee) would tell me to
call IBM?? As well this being a lower end segment with SATA etc, they
dumbed down the ServeManager/Storage Manager to not show any specifics.
I would love some suggestions to get around having to rebuild/backup/restore
this DC. Any thoughts?
"Tim" <Tim@NoSpam> wrote in message
news:ea0XlR1KFHA.2764@tk2msftngp13.phx.gbl...
> Does the Adaptec controller software indicate if there is a failed drive?
>
> If it does - and it sounds like it should have - then your should replace
> the failed drive.
>
> The normal method is to identify which physical drive it is that has
> failed,
> power down (if it is not hotswap), remove it, replace it and power up.
> Most
> raid systems will auto recover the failed drive when a new drive is
> installed. On Intel RAID 1, the recovery is dead slow - about 1 minute per
> Gigabyte if it runs while the server OS is running (Yes while the OS is
> running). I have no idea of the time the Adaptec may take, so be prepared
> for poor performance for an extended interval - consider doing a RAID bios
> level sync of the drives as per the instructions if this is not made
> clear.
>
> It is imperitive to be certain that you correctly identify the failed
> drive
> (well, its not fatal if you preserve the drive you remove, but it can lead
> you down a very time consuming and fruitless path).
>
> I would recommend checking thoroughly the adaptec documentation, helps
> files, and use the installed controller software to aid in this process.
> The
> adaptec controllers are as good as any and *should* operate exactly as you
> expect in a failed drive scenario. Consider using Adpatec support.
>
> The thing I have found confusing in your post is you say RAID 1, then
> mention Mirror - which is what RAID 1 is, then make mention of Stripe.
> Stripe is normally associated with RAID 0 which seems to be totally
> irrelevant to this setup unless of course it is a RAID 10 or 0+1
> configuration. (RAID 0 is to be avoided on servers like the plague).
>
> Sometimes RAID controllers will 'fail' drives for some odd reason. If the
> drive is removed and tested they often test 100% OK. This is quite normal.
> As a general rule, if a RAID controller fails a drive it does so to
> protect
> data - that is its primary purpose after all. The controllers can fail a
> drive when there is no 'actual' fault with the drive. So the general rule
> is
> to replace the drive - even if it does test OK. In some situations people
> do
> recommission drives that have failed - as standby drives or they take a
> somewhat informed risk by re-using the drive. I do not recommend this in
> this scenario.
>
> Simple question: Is the data on the drive worth more than the drive? If
> yes,
> get a new drive if it is marked as failed.
>
> - Tim
>
>
>
>
> "Dan Foxley" <dfoxley@nospampacificdatavision.com> wrote in message
> news:%23KZak3xKFHA.3380@TK2MSFTNGP10.phx.gbl...
>> Sorry for not being more clear. The DC is still up but must be rebuilt.
>> The RAID1 mirror (hardware via Adaptec Host Based SATA RAID controller)
>> has failed but the OS is still up, I'm just not able to rebuild the
>> mirror
>> as I'm told that there is a bad strip any attempts to rebuild fail even
>> though there is no specific error. I was told if I image the OS then
>> restore (Ghost or similar) the bad strip would still exist and further
>> attempts to rebuild the mirror will fail.
>>
>> So I'm looking for the best solution to backup and restore this DC!
>> Should I backup and do a restore after an OS re-install? Should I
>> demote
>> the DC then re-install the OS and re-join the domain?
>>
>> Thanks,
>> Dan
>>
>> ""Rebecca Chen [MSFT]"" <v-rebc@online.microsoft.com> wrote in message
>> news:SePsVltKFHA.1376@TK2MSFTNGXA02.phx.gbl...
>>> Hi Dan,
>>>
>>> I understand you one stripe is broken in the OS RAID 1 on one of your
>>> DCs.
>>> I would like to confirm that do you mean you made RAID 1 for the whole
>>> system on first DC and now it fails to startup? If this is not the case,
>>> please provide me the detailed information about your scenario to allow
>>> me
>>> get a whole picture of it.
>>>
>>> If this is the case, according to the nature of the RAID 1, stripe sets
>>> offer no data redundancy, unlike striping with parity. All data is lost
>>> in
>>> the set if one drive fails in a stripe set. Therefore, the data on the
>>> bad
>>> stripe cannot be recovered. Please know our Partner support newsgroups
>>> are
>>> focused on break-fix scenarios, data lost recovery is not provided. You
>>> may
>>> consider contacting some third-party company for recovering data from
>>> the
>>> bad stripe which may restore all data.
>>>
>>> On the OS side, I would like to provide some information about the AD
>>> information on a domain controller. Since you have AD/DNS/DHCP/Vertas
>>> BackupExec 10 /File and Print Sharing running on this failed server,
>>> technically speaking, if you have replicated DNS and DHCP to other two
>>> machines, you use the following steps to get the AD/DNS/DHCP
>>> inforatmation
>>> back:
>>>
>>> 1. Refer to the following article to seize the FSMO role on the other
>>> DCs
>>> since the first DC fails down:
>>>
>>> Using Ntdsutil.exe to transfer or seize FSMO roles to a domain
>>> controller
>>> http://support.microsoft.com/kb/255504
>>>
>>> 2. Install win2k3 on a good hard disk (let us call it Srv1), promote it
>>> to
>>> be an additional DC in the network. during this period, please install
>>> AD-integrated DNS on the server, this process will replicate all AD-DNS
>>> information from other DC to this server.
>>>
>>> 3. Refer to the following article to backup and restore the DHCP
>>> database
>>> to the Srv1:
>>>
>>> Backing up the DHCP database
>>> http://www.microsoft.com/resources/documentation/WindowsServ/2003/standard/p
>>> roddocs/en-us/Default.asp?url=/resources/documentation/windowsserv/2003/stan
>>> dard/proddocs/en-us/sag_DHCP_und_DatabaseBackup.asp
>>>
>>> Restoring server data
>>> http://www.microsoft.com/resources/documentation/WindowsServ/2003/standard/p
>>> roddocs/en-us/Default.asp?url=/resources/documentation/WindowsServ/2003/stan
>>> dard/proddocs/en-us/sag_dhcp_tro_RestoringData.asp
>>>
>>> HTH!
>>>
>>> If you have any questions or update, please feel free to post back.
>>>
>>> Best regards,
>>>
>>> Rebecca Chen
>>>
>>> MCSE2000 MCDBA CCNA
>>>
>>>
>>> Microsoft Online Partner Support
>>> Get Secure! - www.microsoft.com/security
>>>
>>> =====================================================
>>>
>>> When responding to posts, please "Reply to Group" via your newsreader so
>>> that others may learn and benefit from your issue.
>>>
>>> =====================================================
>>> This posting is provided "AS IS" with no warranties, and confers no
>>> rights.
>>>
>>
>>
>
>
- Next message: Fredrick: "Re: I am trying to set up a server so others can get drivers, etc."
- Previous message: Robert Craig: "Re: I am trying to set up a server so others can get drivers, etc."
- In reply to: Tim: "Re: Bad RAID Configuration Need Rebuild 1st DC"
- Next in thread: Rebecca Chen [MSFT]: "Re: Bad RAID Configuration Need Rebuild 1st DC"
- Reply: Rebecca Chen [MSFT]: "Re: Bad RAID Configuration Need Rebuild 1st DC"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|
|