Re: Hardware - disk failure corrupted OS....
- From: "Bobby Plim" <Bobby_Plim@xxxxxxxxxxxxxxxx>
- Date: Sat, 7 Jan 2006 13:16:10 +0100
We had similar problems over here in France with a PE 1600.
We were having disk errors that were not being reported in the Dell array
manager.
(The blind sentinel syndrome).
Disk errors (>250 errors) were visible but only using Ctrl+M at boot.
Symptoms were that our database starting having corrupted files and started
shutting them down.
We installed previous clean versions of those files and these too were
eventaully taken offline by the DBMS.
Dell array manager was reporting no errors.
Hopeless Dell Gold Support. About 5 diferent people (software, hardware,
peripherals, drivers, ...) said they would "call back" and didn't. Some of
them have since "left the company"
A inexistant "TAM", = a something account manager (supposed to be "total"
more like "temporary" even though he is part of the Gold support contract
that you paid for up front.
Supposed to be the eagle-eye overall Dell support manager for our entire
organisation. Then they tell us that there is one TAM per incident! That
contradicts the terms of contract.
They sent in a new disk , 6 hours late, (read the coontract) and then a
techie arrived to install it. He immeadiately lost the boot partition, maybe
not his fault. He then reinstalled Windows 2003 Server but on a 4 Gb C:
partition ! and disappeared out the door.
I had to wipe all that and reinstall everything. OS, DBMS, data ... drivers,
everything
Then we had to reinstall the new Dell array manager which is supposed to
report erors. But practically nobody in Dell Europe knows how to do it.
In fact, there is a secret command line switch that you have to use when you
install the Dell Open manage software so thet the new Dell array manager
actually gets installed. I can't remember it offhand but there is at least
one person chez Dell who knows it and I could track him down for you if need
be.
And there's no way of knowing if the Array manager disk error tracking
system is functionning or not. Is the setinel reporting nothing because a)
there is nothing to report or b) because it cannot communicate with the
other layers of software ? The only real way of knowing that your disks are
cracking up is to do a Ctrl M during the boot and then check what I call the
"BIOS PERC logs".
Yes indeed, you do look pretty stupid when you explain that to yor Boss, or
your customers.
Are we getting a good deal from dell on these PERC machines ?
OK, thy're cheap but is the "write delay" not causing more problems than
enough.
Should'nt we be using "write through" ?
Maybe this is off topic ?
Bobby
"/kj" <kj@xxxxxxxxxxx> a écrit dans le message de news:
OM0xWC2EGHA.2648@xxxxxxxxxxxxxxxxxxxxxxx
> Any reason drive #1 wasn't just yanked?
>
> Not a personal favorite of mine, but shouldn't the PERC rebuild using the
> hot spare? Probably too late now though.
>
> Sympathies in any case, fwiw.
>
> /kj
> "Lanwench [MVP - Exchange]"
> <lanwench@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
> message news:u6iQpp1EGHA.2040@xxxxxxxxxxxxxxxxxxxxxxx
>> Argle - one of my clients is small office with a newish (within a year)
>> Dell PowerEdge 2800 with SBS2003 (standard) - running like a dream ere
>> now. Server has five 73GB SSCI drives in a single RAID5 array (four plus
>> one hot spare). The server was getting slow for the users, as reported in
>> a phone call I got while out of town for the holidays - they did a reboot
>> and the PERC reported a mismatch between the drives and the card config;
>> they pressed the any key to continue and it finally came back up. Other
>> than being slow, it still puttered along -
>>
>> After I got back to town, I went in - saw some more of the 'delayed
>> write' errors in the event logs, and saw that the performance was indeed
>> slow as molasses. Placed a service call w/Dell, who had me run some
>> diagnostics - we saw in one that drive #1 was reporting a boatload of
>> errors (imminent failure, SMART) but that it hadn't actually *failed*.
>> Tech had me update the BIOS, too - no help, but it was an old BIOS
>> revision so I'm glad it's updated. Ran the utility partition diagnostics,
>> and during the tests on disk 1 I got tired of pressing Yes to continue
>> after the 100th time (no joke) so I called Dell back....they arranged for
>> hardware replacement. In the meantime, the poor server now won't boot at
>> all - it starts, then gets to the "\windows\system32\config\system is
>> missing or corrupt" msg, no getting round it.
>>
>> Dell sent out a tech to replace the drive & backplane yesterday, and they
>> were kind enough to just leave the spare HD there for me to work on it -
>> so I have to go see what I can do. I *think* they got a good tape backup
>> on Tuesday - but the server was acting so d__d slow it took a long time
>> to run, so that tape didn't eject til far after the job should have
>> normally completed....and now I don't have access to the backup logs,
>> natch.
>>
>> I'm really irritated that the RAID didn't fail the drive and do its job,
>> or beep or anything. Apparently it didn't think the drive was that bad. I
>> tend to disagree - it's corrupted the OS. I freakin' *hate* hardware.
>> I've never had to deal with this sort of thing with SBS before - I may
>> well be calling PSS when I go in tomorrow - but any ideas? Prayers?
>>
>>
>
>
.
- Follow-Ups:
- Re: Hardware - disk failure corrupted OS....
- From: Lanwench [MVP - Exchange]
- Re: Hardware - disk failure corrupted OS....
- References:
- Hardware - disk failure corrupted OS....
- From: Lanwench [MVP - Exchange]
- Re: Hardware - disk failure corrupted OS....
- From: /kj
- Hardware - disk failure corrupted OS....
- Prev by Date: Re: Which IP address do I register?
- Next by Date: Re: Problems pasting files onto new server shared folder
- Previous by thread: Re: Hardware - disk failure corrupted OS....
- Next by thread: Re: Hardware - disk failure corrupted OS....
- Index(es):
Relevant Pages
|