Re: Advice on RAID crash

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



chkdsk /f /r did work for us on a large RAID 5 array with failing sectors. But, the cascade effect eventually brought the whole array down. While the controller does "mask" things to some degree ... it cannot hide bad sectors from the OS where data is being actively written as I understand it.

If data is okay ... get rid of the controller. You should be able to plug in one of Adaptec's newer RAID controllers and have it natively pickup the existing array configuration on the existing drives.

If the OS is on the original array, then plug the new controller into the box first and load up the drivers into the OS. Shutdown, pull the old controller and move everything over. On bootup the OS should pickup the drivers on the controller with no issues.

Make sure the drive position and array info does not change during the process and make sure you have a good backup ahead of time. We recommend looking at StorageCraft's ShadowProtect www.storagecraft.com as a great way to create a snapshot. You could use this product to make a snapshot and restore to a new set of drives after the fact.

--

Philip E.
MPECS Inc.
Microsoft Small Business Specialists
http://blog.mpecsinc.ca
"Al Williams" <donotreplydirect@xxxxxxxxxxxxxxxx> wrote in message news:eeYbS0tUIHA.3916@xxxxxxxxxxxxxxxxxxxxxxx
No errors with the chkdsk, also ran a RAID verify in Storage Manager with no issues. The system log errors seem to point to the entire array simply dropping offline until I was able to reboot the box. No corruption to any of the volumes and databases on the array. I suppose I can setup a test system (I have a spare SCSI card) and start pulling out the drives one at a time and testing them (I have a spare) to see if any are bad but this doesn't seem like the issue here.

Also, regarding chkdsk /f and bad sector checking - does this actually do anything on hardware RAID controllers? I would think the RAID controller would hide sector issues from the OS the same way it hides the actual array makeup.

Thanks.

--
Allan Williams



"Philip E." <groups@xxxxxxxxxxxxxxxx> wrote in message news:evqCyqlUIHA.5836@xxxxxxxxxxxxxxxxxxxxxxx
In your case, you need a SCSI controller you can plug the individual drive into and download and setup the Quantum/Maxtor/Seagate test tools to boot into in order to physically test each drive.

The RAID controller was a Promise.

For SBS Seagate's Enterprise SATA will more than do the job. We have many in production and are quite happy with them.

We run with Intel through and through on our servers. So, SRCSAS18E: http://www.intel.com/design/servers/raid/srcsas18e/index.htm

Or the new one SRCSASJV: http://www.intel.com/design/servers/raid/srcsas18e/index.htm

Both are PCI-E 8x which means the drives can be stretched to their max.

--

Philip E.
MPECS Inc.
Microsoft Small Business Specialists
http://blog.mpecsinc.ca
"Al Williams" <donotreplydirect@xxxxxxxxxxxxxxxx> wrote in message news:e01$9clUIHA.484@xxxxxxxxxxxxxxxxxxxxxxx
The disks are Atlas 10K2/15K models and are getting up there (5+ years for some) but I've had disk errors with this controller (which is also 5+ years old) before and they always show up in the logs. That is what has got me confused as I've never had an issue with Adaptec controllers - they are always bulletproof. Was yours an Adaptec controller as well?

I think I have a single channel; RAID card about - I think I will setup a test system to test the drives as I swap them out. What did you use to test the drives?

You may be right about looking into a new controller and drive setup. What would you reco: SAS, SATA or stick with U320?

Thx

--
Allan Williams



"Philip E." <groups@xxxxxxxxxxxxxxxx> wrote in message news:udGeIKlUIHA.5816@xxxxxxxxxxxxxxxxxxxxxxx
How old are the drives and the RAID controller?

Suggestion: Budget for a new box, or at least a new RAID setup.

We had Event ID 15s on a 6 drive RAID 5 array that turned out to be some bad sectors on one of the drives in the array that the controller and the OS did not pickup on. We ended up figuring that out after we replaced the controller and drives and could test each drive individually.

But, we needed to go through some pain to get things back first.

Run chkdsk /f /r on the array. Note that it can take a huge chunk of time to complete.

If it is indeed bad sectors on one of the array members, the problem may not show itself for a while again. Part of the problem is that a chkdsk /f /r on the dismounted partition still may not fix things completely as the bad sector problem can keep growing beyond the bad areas marked by the OS after the chkdks is run.
--

Philip E.
MPECS Inc.
Microsoft Small Business Specialists
http://blog.mpecsinc.ca
"Al Williams" <donotreplydirect@xxxxxxxxxxxxxxxx> wrote in message news:eQeaBqkUIHA.4696@xxxxxxxxxxxxxxxxxxxxxxx
SBS2003 Premium SP2

Recently our Adaptec 2100 SCSI U160 RAID5 array went down and caused a bunch of errors like this in the system log:

Event Type: Error
Event Source: dpti2o
Event Category: None
Event ID: 15
Date: 08-01-07
Time: 8:18:55 AM
User: N/A
Computer: HISERVER
Description:
The device, \Device\Scsi\dpti2o1, is not ready for access yet.
----------------------
Event Type: Error
Event Source: Disk
Event Category: None
Event ID: 11
Date: 08-01-07
Time: 8:18:55 AM
User: N/A
Computer: HISERVER
Description:
The driver detected a controller error on \Device\Harddisk0.

This RAID array contains some data and the exchange and SQL databases. Other drives are on another separate RAID mirror, so the server continued to run as best it could but needed several reboots to get going Monday morning (lots of SQL and exchange errors in the logs due to the drive going offline). After rebooting a few times the server has been fine ever since (did a full backup as well).

The strange thing is I checked the Adaptec Storage Manager logs and it shows no errors. As far as it is concerned the RAID array is fine - none of the disks show any faults and the array did not degrade.

No changes have been made recently to the server except for a couple of of the usual monthly winupdates. I leaning towards this being a hardware problem as the server locked up tight a few times while attempting to recover from the issue Monday (gave a STOP 100000ea as well). Could the RAID card itself be having issues?

Any ideas on how I can determine what caused the RAID to go offline?

Thanks,

--
Allan Williams











.



Relevant Pages

  • Need help with RAID question. I think something is really screwed up!!!
    ... The I set up a RAID 5 array using an external enclosure and four ... Western Digital WD5000YS drives. ... I set up the array and it worked. ... controller, not the RAID controller). ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: Problems with software RAID on SATA
    ... Connected to this are two 320GB drives ... >>which I want to turn into a RAID1 array. ... >>I'm almost certain it's a problem with initting the RAID arrays at boot. ...
    (Debian-User)
  • Re: RAID newbie...can I have several partitions on a RAID 1 array?
    ... You haven't expounded upon why you think you need raid. ... better backup device rather than buy 2 cheap RAID HBAs. ... RAID array then I would have to replace the mobo with the same one or at ... Lets say, for example, you buy 2 identical model drives, from ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: [PATCH 000 of 5] md: Introduction
    ... "why linux raid isn't Raid really, why it can be worse than plain disk") ... After this, the array ... error is in the filesystem, due to the complex layout of raid5. ... hundreds or 1000s of drives, you've quite high probability that some of them will fail sometimes, or will develop a bad sector etc). ...
    (Linux-Kernel)
  • Re: best practice for hard drive upgrade
    ... It's annoying that after all these years this is not surprising, the simple action of replacing an existing array with a similar array on larger drives isn't exactly something I consider as 'pushing the envelope'. ... Apparently this RAID card - promise tx4310 - will not resize the array. ... The controller also may not support multiple volumes. ...
    (microsoft.public.windows.server.sbs)