Re: Advice on RAID crash
- From: "Philip E." <groups@xxxxxxxxxxxxxxxx>
- Date: Wed, 9 Jan 2008 11:34:14 -0700
chkdsk /f /r did work for us on a large RAID 5 array with failing sectors. But, the cascade effect eventually brought the whole array down. While the controller does "mask" things to some degree ... it cannot hide bad sectors from the OS where data is being actively written as I understand it.
If data is okay ... get rid of the controller. You should be able to plug in one of Adaptec's newer RAID controllers and have it natively pickup the existing array configuration on the existing drives.
If the OS is on the original array, then plug the new controller into the box first and load up the drivers into the OS. Shutdown, pull the old controller and move everything over. On bootup the OS should pickup the drivers on the controller with no issues.
Make sure the drive position and array info does not change during the process and make sure you have a good backup ahead of time. We recommend looking at StorageCraft's ShadowProtect www.storagecraft.com as a great way to create a snapshot. You could use this product to make a snapshot and restore to a new set of drives after the fact.
--
Philip E.
MPECS Inc.
Microsoft Small Business Specialists
http://blog.mpecsinc.ca
"Al Williams" <donotreplydirect@xxxxxxxxxxxxxxxx> wrote in message news:eeYbS0tUIHA.3916@xxxxxxxxxxxxxxxxxxxxxxx
No errors with the chkdsk, also ran a RAID verify in Storage Manager with no issues. The system log errors seem to point to the entire array simply dropping offline until I was able to reboot the box. No corruption to any of the volumes and databases on the array. I suppose I can setup a test system (I have a spare SCSI card) and start pulling out the drives one at a time and testing them (I have a spare) to see if any are bad but this doesn't seem like the issue here.
Also, regarding chkdsk /f and bad sector checking - does this actually do anything on hardware RAID controllers? I would think the RAID controller would hide sector issues from the OS the same way it hides the actual array makeup.
Thanks.
--
Allan Williams
"Philip E." <groups@xxxxxxxxxxxxxxxx> wrote in message news:evqCyqlUIHA.5836@xxxxxxxxxxxxxxxxxxxxxxxIn your case, you need a SCSI controller you can plug the individual drive into and download and setup the Quantum/Maxtor/Seagate test tools to boot into in order to physically test each drive.
The RAID controller was a Promise.
For SBS Seagate's Enterprise SATA will more than do the job. We have many in production and are quite happy with them.
We run with Intel through and through on our servers. So, SRCSAS18E: http://www.intel.com/design/servers/raid/srcsas18e/index.htm
Or the new one SRCSASJV: http://www.intel.com/design/servers/raid/srcsas18e/index.htm
Both are PCI-E 8x which means the drives can be stretched to their max.
--
Philip E.
MPECS Inc.
Microsoft Small Business Specialists
http://blog.mpecsinc.ca
"Al Williams" <donotreplydirect@xxxxxxxxxxxxxxxx> wrote in message news:e01$9clUIHA.484@xxxxxxxxxxxxxxxxxxxxxxxThe disks are Atlas 10K2/15K models and are getting up there (5+ years for some) but I've had disk errors with this controller (which is also 5+ years old) before and they always show up in the logs. That is what has got me confused as I've never had an issue with Adaptec controllers - they are always bulletproof. Was yours an Adaptec controller as well?
I think I have a single channel; RAID card about - I think I will setup a test system to test the drives as I swap them out. What did you use to test the drives?
You may be right about looking into a new controller and drive setup. What would you reco: SAS, SATA or stick with U320?
Thx
--
Allan Williams
"Philip E." <groups@xxxxxxxxxxxxxxxx> wrote in message news:udGeIKlUIHA.5816@xxxxxxxxxxxxxxxxxxxxxxxHow old are the drives and the RAID controller?
Suggestion: Budget for a new box, or at least a new RAID setup.
We had Event ID 15s on a 6 drive RAID 5 array that turned out to be some bad sectors on one of the drives in the array that the controller and the OS did not pickup on. We ended up figuring that out after we replaced the controller and drives and could test each drive individually.
But, we needed to go through some pain to get things back first.
Run chkdsk /f /r on the array. Note that it can take a huge chunk of time to complete.
If it is indeed bad sectors on one of the array members, the problem may not show itself for a while again. Part of the problem is that a chkdsk /f /r on the dismounted partition still may not fix things completely as the bad sector problem can keep growing beyond the bad areas marked by the OS after the chkdks is run.
--
Philip E.
MPECS Inc.
Microsoft Small Business Specialists
http://blog.mpecsinc.ca
"Al Williams" <donotreplydirect@xxxxxxxxxxxxxxxx> wrote in message news:eQeaBqkUIHA.4696@xxxxxxxxxxxxxxxxxxxxxxxSBS2003 Premium SP2
Recently our Adaptec 2100 SCSI U160 RAID5 array went down and caused a bunch of errors like this in the system log:
Event Type: Error
Event Source: dpti2o
Event Category: None
Event ID: 15
Date: 08-01-07
Time: 8:18:55 AM
User: N/A
Computer: HISERVER
Description:
The device, \Device\Scsi\dpti2o1, is not ready for access yet.
----------------------
Event Type: Error
Event Source: Disk
Event Category: None
Event ID: 11
Date: 08-01-07
Time: 8:18:55 AM
User: N/A
Computer: HISERVER
Description:
The driver detected a controller error on \Device\Harddisk0.
This RAID array contains some data and the exchange and SQL databases. Other drives are on another separate RAID mirror, so the server continued to run as best it could but needed several reboots to get going Monday morning (lots of SQL and exchange errors in the logs due to the drive going offline). After rebooting a few times the server has been fine ever since (did a full backup as well).
The strange thing is I checked the Adaptec Storage Manager logs and it shows no errors. As far as it is concerned the RAID array is fine - none of the disks show any faults and the array did not degrade.
No changes have been made recently to the server except for a couple of of the usual monthly winupdates. I leaning towards this being a hardware problem as the server locked up tight a few times while attempting to recover from the issue Monday (gave a STOP 100000ea as well). Could the RAID card itself be having issues?
Any ideas on how I can determine what caused the RAID to go offline?
Thanks,
--
Allan Williams
.
- References:
- Advice on RAID crash
- From: Al Williams
- Re: Advice on RAID crash
- From: Philip E.
- Re: Advice on RAID crash
- From: Al Williams
- Re: Advice on RAID crash
- From: Philip E.
- Re: Advice on RAID crash
- From: Al Williams
- Advice on RAID crash
- Prev by Date: Re: Inbound fax Issues SBS2003
- Next by Date: Re: Problem with DHCP with a Server and Firebox
- Previous by thread: Re: Advice on RAID crash
- Next by thread: Inbound fax Issues SBS2003
- Index(es):
Relevant Pages
|