Re: Adventures in Server restore
- From: "Andy Cobley" <acobley@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 21 Jun 2005 10:39:20 +0100
Again, many thanks for the reply.
This is in fact an in house built server, which are usually very reliable.
It's a 3 GHz Pentium 4, not overclocked etc. I ran a copy of Aida32 on it,
tem in the processor last night was 60 C, our other servers (in different
cases but same processor) are running at 40C, quite a difference.
I've been running Exmerge today moving users off this server to an older 5.5
server so we can take this one down and rebuild it. I did notice that at
one point the CPU temperature hit 64 C and promptly blue screened. Am I
right in saying P4 3Ghz start overheating at 68C or there abouts ?
Andy
"Ben Winzenz [Exchange MVP]" <ben_winzenz@NOSPAMdotmessageonedotcom> wrote
in message news:%23lebqJddFHA.2984@xxxxxxxxxxxxxxxxxxxxxxx
> If it is a brand-name box (HP/Compaq, Dell, IBM, etc.) there should be
> some software (Dell OpenManager, Compaq Insight Manager) that can give you
> that information. Regardless, the chipsets on the motherboard (at least
> most that are a few years old or newer) should have integrated temperature
> sensors, so it's just a matter of getting an app to read those.
>
> Usually, -1018's are indicative of disk subsystem problems, but it could
> also be bad memory. In addition to checking the temperature, I'd suggest
> running a comprehensive system diagnostics on it as well as making sure
> all firmware and drivers are up to date, especially on the RAID controller
> and backplane, etc.
>
> --
> Ben Winzenz
> Exchange MVP
> MessageOne
>
>
> "Andy Cobley" <acobley@xxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:eTgn3qcdFHA.2556@xxxxxxxxxxxxxxxxxxxxxxx
>> Thanks for that Ben.
>>
>> We are in fact running SP1. These errors seem to be at random at the
>> moment, I'm strongly suspecting a hardware error due to overheating.
>> Time to get some sort of temp monitor in the box I think.
>>
>> Andy
>>
>> "Ben Winzenz [Exchange MVP]" <ben_winzenz@NOSPAMdotmessageonedotcom>
>> wrote in message news:uvT0WVcdFHA.3052@xxxxxxxxxxxxxxxxxxxxxxx
>>> To be clear, SP1 for Exchange includes some additional error correcting
>>> that prevents -1018 errors due to flipped bits. If it is a physical
>>> RAID controller or disk subsystem problem, it will likely not solve the
>>> problem. Also, there is no need to run eseutil /d after installing SP1
>>> for Exchange. The article references the need to run eseutil /d IF you
>>> choose to use eseutil to repair the old damaged database. Otherwise, no
>>> need.
>>>
>>> --
>>> Ben Winzenz
>>> Exchange MVP
>>> MessageOne
>>>
>>>
>>> "Andy Cobley" <acobley@xxxxxxxxxxxxxxxxxxxxxx> wrote in message
>>> news:eZPTkmadFHA.1292@xxxxxxxxxxxxxxxxxxxxxxx
>>>> Sjai,
>>>>
>>>> Thanks for that information. Very useful.
>>>>
>>>> Andy
>>>>
>>>> "Shai Netanel" <ShaiNetanel@xxxxxxxxx> wrote in message
>>>> news:34579114-676B-443D-8B74-F1384478E00F@xxxxxxxxxxxxxxxx
>>>>> Hello
>>>>> I see this problem before..
>>>>> It is hardware problem; it can be disk problem or Raid / SCSI
>>>>> controller
>>>>> problem
>>>>> SP1 for exchange change the db jet and should fix this problem
>>>>> (After installing SP1 you need to run eseutil /d)
>>>>>
>>>>> http://www.kbalertz.com/kb_Q314917.aspx
>>>>>
>>>>> Regards,
>>>>> Shai Netanel
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> "Andy Cobley" wrote:
>>>>>
>>>>>> I thought I'd pass on my experience in bringing back an exchange
>>>>>> server
>>>>>> that has gone bad. Basically, although I had done some test backup
>>>>>> and
>>>>>> restore exercises, a real restore turned out to be far from easy.
>>>>>>
>>>>>> Basically On Friday night (around midnight) something happened to
>>>>>> corrupt
>>>>>> the priv.edb. I'm not sure what happened (but a online backup was
>>>>>> going on)
>>>>>> I started to get the following messages in the event viewer.
>>>>>>
>>>>>> "Information Store (1932) First Storage Group: The database page read
>>>>>> from
>>>>>> the file "F:\mdbdata\priv1.edb" at offset 6660096
>>>>>> (0x000000000065a000) for
>>>>>> 4096 (0x00001000) bytes failed verification due to a page checksum
>>>>>> mismatch.
>>>>>> The expected checksum was 3713556033693036075 (0x3389338991c82e2b)
>>>>>> and the
>>>>>> actual checksum was 7734955069946726443 (0x6b5814a791c8082b). The
>>>>>> read
>>>>>> operation will fail with error -1018 (0xfffffc06). If this condition
>>>>>> persists then please restore the database from a previous backup.
>>>>>> This
>>>>>> problem is likely due to faulty hardware. Please contact your
>>>>>> hardware
>>>>>> vendor for further assistance diagnosing the problem. "
>>>>>>
>>>>>> The mail service did continue to run though.
>>>>>>
>>>>>> On Saturday whilst trying to do a RSG restore from backups things
>>>>>> went from
>>>>>> bad to worse and eventually the server crashed and continued to crash
>>>>>> with
>>>>>> random errors in the event log. I thought it may be a virus, but a
>>>>>> check
>>>>>> revealed not. I did manage to use Exmerge to extract to PST the
>>>>>> users
>>>>>> mailboxes.
>>>>>>
>>>>>> The server continues to crash, sometimes not letting me log in. I
>>>>>> decided a
>>>>>> Dialtone recovery would be best. However I couldn't create a new
>>>>>> blank
>>>>>> database. Thanks to Rich Matheisen for pointing out that to create
>>>>>> the new
>>>>>> dialtone database I needed to move the log files to a new location,
>>>>>> essentially deleting them form the server. So now I've got a diatone
>>>>>> database and people can send and receive mail.
>>>>>>
>>>>>> Next up, use Exmerge to bring back the pst files from Saturday. Bad
>>>>>> move,
>>>>>> one of them must have contained the data that was corrupting the
>>>>>> database
>>>>>> and the server started crashing again with blue screen dumps. I have
>>>>>> to
>>>>>> admit I was losing what little hair I had left. For good measure the
>>>>>> IIS
>>>>>> file MetaBase.xml had become corrupt (presumably because of the
>>>>>> crashes)
>>>>>> causing Exchange services not to start correctly. This was restored
>>>>>> from a
>>>>>> system backup.
>>>>>>
>>>>>> The only solution was to disable the exchange services to get the
>>>>>> server
>>>>>> stable. Then create a new dialtone database (thus losing any mail
>>>>>> received
>>>>>> this morning, a backup couldn't be created because the server was
>>>>>> only up
>>>>>> for a couple of minutes at a time.) This has now allowed me to get
>>>>>> the
>>>>>> server running again. I'm going to run it for a day like this just
>>>>>> to
>>>>>> confirm it's stable.
>>>>>>
>>>>>> However I can't restore from the pst files on Saturday. So I'm going
>>>>>> to
>>>>>> have to go back to older backup files to get the users mail back.
>>>>>>
>>>>>> Andy C
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
.
- Follow-Ups:
- Re: Adventures in Server restore
- From: Ben Winzenz [Exchange MVP]
- Re: Adventures in Server restore
- References:
- Adventures in Server restore
- From: Andy Cobley
- RE: Adventures in Server restore
- From: Shai Netanel
- Re: Adventures in Server restore
- From: Andy Cobley
- Re: Adventures in Server restore
- From: Ben Winzenz [Exchange MVP]
- Re: Adventures in Server restore
- From: Andy Cobley
- Re: Adventures in Server restore
- From: Ben Winzenz [Exchange MVP]
- Adventures in Server restore
- Prev by Date: Importing into Shared Calendars
- Next by Date: Re: How to roll back Ex2k3 Forest/Domain Prep?
- Previous by thread: Re: Adventures in Server restore
- Next by thread: Re: Adventures in Server restore
- Index(es):
Relevant Pages
|
Loading