Re: PowerEdge 1800 Spontaneous Reboots
- From: "Dave Nickason [SBS MVP]" <gwdibble@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 26 Sep 2006 10:58:56 -0400
Several thoughts in no particular order:
You're obviously going to have to find some time to do diagnostics on this
box, maybe on a weekend or evening. But you should be able to check some
things out while it's in production.
When you say "crashing," are you referring to blue screens, spontaneous
reboots, server unresponsive? What, exactly, are the symptoms you're
seeing.
Are there any other errors or warnings in the logs, even outside of the time
you're seeing the server crashing?
What AV are you running? Are you confident that it's configured and
operating normally? AV, particularly Exchange AV, is a frequent source of
some of these mystery problems.
I would call Dell and explain the situation with the RAID. Have them walk
you through the procedure of reattaching the functioning drive to the
controller and rebuilding the mirror onto the second drive.
I doubt it's the UPS, but UPSs causing restarts is reported fairly commonly.
I'd disconnect the signaling cable (serial or USB) from the server while
you're going through this, just in case.
Same for temperature. Is the server in a well ventilated area of a room
that's under 90 F? If not, that could be it.
I'm guessing that you're running Open Manage Server Administrator since you
know the processor temp. Have you found anything in there? From the main
page in OM, click Logs to see if the embedded server management is reporting
anything of interest.
Please don't think about reinstalling the OS, or anything else, at this
point. For one thing, I never recommend doing that in the absence of a firm
diagnosis. More importantly in your case, this could easily be hardware,
and all that work would have you right where you are today.
"Gregory Orton : SBS Admin" <ignoranceisbliss@xxxxxxxxx> wrote in message
news:1159275002.939508.160180@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
***** Please Read Carefully Before Posting an Answer *****
I am currently an Administrator of a SBS installation and have some
serious issues
First some specs
OS : Windows 2000 (not 2003!) // SBS 2000
System : DeLL Poweredge 1800 Server (consisting of)
Dual Xeon 3.2 GHz Processors
2GB DDR II Ram
2 x 160GB SATA HDD
1 x DeLL CERC 1.5/6ch RAID controller
1 x APC Smart UPS
1x DVD-ROM drive
1 x Floppy Drive
2 x Dell Power Supplies
1 x Dell Powervault Travan Tape Drive
Age: The server was bought, along with the UPS in May 2006, and not
brought into production service til July 2006, and the UPS was not
brought online until August 2006
Problem Number 1: I am at my wits end. My DeLL PowerEdge 1800 server
keeps crashing spontaneously. The only message in the event log is
"Server Shutdown @ insert time here was unexpected. Trying to diagnose
this problem is extremely difficult as the server is in a production
environment and in constant use, so I cannot run any diagnostic tests
to look at the problem. These crashes have only happen 2 or 3 times but
the same message always.
The time of the crashes is the most boggling thing. It's always random
or at around 10.15am or 1.30 am (only happened once at this time) and
there are no discernible scheduled processes that run at this time,
except a script for the intranet that sends out a reminder email.
I thought it might be the UPS, but I am using the latest possible
version of Pwerchute, the log files dont describe any loss of power or
shutdown process
I thought it could be the raid controller, but no HDD are attached to
it right now (see problem 2).
I thought it might be overheating but the processors always idle at ~
58 degrees, which i know is hot, but they've always done that, and the
maximum threshold is set to 120 (!)
Problem Number 2: On a related note, last Monday the 18th September our
server crashed at 1.38am in the morning. This is the first and only
time it has done it at this time.
Upon arrival to work and an attempt at a reboot the server was booting
to the Win2K (not 2003!) splash screen and hanging, then BSOD to
INACCESBILE_BOOT_DEVICE.
At this time, we had our two 160 SATA HDD's attached to the CERC
1.5/6ch RAID controller on channel 0 and channel 1. This was built as a
RAID 1 Mirror array spanning the whole of the 160GB. This array is
split into 5 virtual disks,
C: OS
E: Windows Server Update Services
G: Company
I: Intranet
O: Profiles
This was done to preserve data integrity and enable us to ghost //
backup certain data where our backup media was too small.
Because of this error message from the BSOD I did not want to risk
rebuilding the MBR or BOOT record with both disks plugged into the
array. So I removed one and then used the Win2K repair console to
repair the MBR and BOOT record.
array controller. This i thought was the start of the recovery. So iFrom here the single disk booted fine, whilst still plugged into the
shut down and plugged the second disk in, in the vain hope that the
array would rebuild from the working disk.
Obviously this was a flawed argument that there was no way of telling
the CERC controller where to rebuild the array from. Still, after
turning the server on again, the controller reported during POST that
the array disks were present, the array had FAILEd and then "no boot
device found" (!)
This scared me as both disks were plugged in and one should be working.
So i turned off again and tried with each individual disk plugged in
and on different channels. The controller recognised the disk from a
different channel so the effect was negligable, but the most important
thing was that I was still getting the "no boot device found".
This made me panic a little as I this meant to me I now had two
unbootable disks with my entire business' prodcution environment on it.
In a last ditch attempt I plugged the "fixed" HDD into a SATA port
directly to the motherboard, and lo and behold it booted!
The server has been at this stage ever since. We are working from 1 x
160 GB drive plugged straight into the SATA port. This is where we were
before when we had our old server! The reason for this server upgrade
was to provide the data integrity / security / redundancy that RAID
offers!
I need advice from someone now. I have 1 x 160GB drive with latest data
on it, plugged into a SATA port. 1 x 160 GB drive unplugged and
dormant, and 1 x CERC 1.5/6ch raid controller with nothing plugged into
it.
I need to know now if it is possible for me to return this now single
160GB SATA non RAID setup back into a RAID 1 mirror? I am assuming the
now out of date HDD with old data is still intact, shouldn't it be as
simple as plugging them both in, and telling the raid controller to
build an array from one disk without having to wipe the disks, rebuild
the array and then re-install the OS. It's more hassle than its worth!
help me
.
- References:
- PowerEdge 1800 Spontaneous Reboots
- From: Gregory Orton : SBS Admin
- PowerEdge 1800 Spontaneous Reboots
- Prev by Date: Re: Migrate from SBS2003 to Win Server 2003
- Next by Date: Re: Migration AD to SBS 2003
- Previous by thread: PowerEdge 1800 Spontaneous Reboots
- Next by thread: Re: PowerEdge 1800 Spontaneous Reboots
- Index(es):
Relevant Pages
|