Re: Adventures in Server restore



Again, many thanks for the reply.

This is in fact an in house built server, which are usually very reliable.
It's a 3 GHz Pentium 4, not overclocked etc. I ran a copy of Aida32 on it,
tem in the processor last night was 60 C, our other servers (in different
cases but same processor) are running at 40C, quite a difference.

I've been running Exmerge today moving users off this server to an older 5.5
server so we can take this one down and rebuild it. I did notice that at
one point the CPU temperature hit 64 C and promptly blue screened. Am I
right in saying P4 3Ghz start overheating at 68C or there abouts ?

Andy



"Ben Winzenz [Exchange MVP]" <ben_winzenz@NOSPAMdotmessageonedotcom> wrote
in message news:%23lebqJddFHA.2984@xxxxxxxxxxxxxxxxxxxxxxx
> If it is a brand-name box (HP/Compaq, Dell, IBM, etc.) there should be
> some software (Dell OpenManager, Compaq Insight Manager) that can give you
> that information. Regardless, the chipsets on the motherboard (at least
> most that are a few years old or newer) should have integrated temperature
> sensors, so it's just a matter of getting an app to read those.
>
> Usually, -1018's are indicative of disk subsystem problems, but it could
> also be bad memory. In addition to checking the temperature, I'd suggest
> running a comprehensive system diagnostics on it as well as making sure
> all firmware and drivers are up to date, especially on the RAID controller
> and backplane, etc.
>
> --
> Ben Winzenz
> Exchange MVP
> MessageOne
>
>
> "Andy Cobley" <acobley@xxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:eTgn3qcdFHA.2556@xxxxxxxxxxxxxxxxxxxxxxx
>> Thanks for that Ben.
>>
>> We are in fact running SP1. These errors seem to be at random at the
>> moment, I'm strongly suspecting a hardware error due to overheating.
>> Time to get some sort of temp monitor in the box I think.
>>
>> Andy
>>
>> "Ben Winzenz [Exchange MVP]" <ben_winzenz@NOSPAMdotmessageonedotcom>
>> wrote in message news:uvT0WVcdFHA.3052@xxxxxxxxxxxxxxxxxxxxxxx
>>> To be clear, SP1 for Exchange includes some additional error correcting
>>> that prevents -1018 errors due to flipped bits. If it is a physical
>>> RAID controller or disk subsystem problem, it will likely not solve the
>>> problem. Also, there is no need to run eseutil /d after installing SP1
>>> for Exchange. The article references the need to run eseutil /d IF you
>>> choose to use eseutil to repair the old damaged database. Otherwise, no
>>> need.
>>>
>>> --
>>> Ben Winzenz
>>> Exchange MVP
>>> MessageOne
>>>
>>>
>>> "Andy Cobley" <acobley@xxxxxxxxxxxxxxxxxxxxxx> wrote in message
>>> news:eZPTkmadFHA.1292@xxxxxxxxxxxxxxxxxxxxxxx
>>>> Sjai,
>>>>
>>>> Thanks for that information. Very useful.
>>>>
>>>> Andy
>>>>
>>>> "Shai Netanel" <ShaiNetanel@xxxxxxxxx> wrote in message
>>>> news:34579114-676B-443D-8B74-F1384478E00F@xxxxxxxxxxxxxxxx
>>>>> Hello
>>>>> I see this problem before..
>>>>> It is hardware problem; it can be disk problem or Raid / SCSI
>>>>> controller
>>>>> problem
>>>>> SP1 for exchange change the db jet and should fix this problem
>>>>> (After installing SP1 you need to run eseutil /d)
>>>>>
>>>>> http://www.kbalertz.com/kb_Q314917.aspx
>>>>>
>>>>> Regards,
>>>>> Shai Netanel
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> "Andy Cobley" wrote:
>>>>>
>>>>>> I thought I'd pass on my experience in bringing back an exchange
>>>>>> server
>>>>>> that has gone bad. Basically, although I had done some test backup
>>>>>> and
>>>>>> restore exercises, a real restore turned out to be far from easy.
>>>>>>
>>>>>> Basically On Friday night (around midnight) something happened to
>>>>>> corrupt
>>>>>> the priv.edb. I'm not sure what happened (but a online backup was
>>>>>> going on)
>>>>>> I started to get the following messages in the event viewer.
>>>>>>
>>>>>> "Information Store (1932) First Storage Group: The database page read
>>>>>> from
>>>>>> the file "F:\mdbdata\priv1.edb" at offset 6660096
>>>>>> (0x000000000065a000) for
>>>>>> 4096 (0x00001000) bytes failed verification due to a page checksum
>>>>>> mismatch.
>>>>>> The expected checksum was 3713556033693036075 (0x3389338991c82e2b)
>>>>>> and the
>>>>>> actual checksum was 7734955069946726443 (0x6b5814a791c8082b). The
>>>>>> read
>>>>>> operation will fail with error -1018 (0xfffffc06). If this condition
>>>>>> persists then please restore the database from a previous backup.
>>>>>> This
>>>>>> problem is likely due to faulty hardware. Please contact your
>>>>>> hardware
>>>>>> vendor for further assistance diagnosing the problem. "
>>>>>>
>>>>>> The mail service did continue to run though.
>>>>>>
>>>>>> On Saturday whilst trying to do a RSG restore from backups things
>>>>>> went from
>>>>>> bad to worse and eventually the server crashed and continued to crash
>>>>>> with
>>>>>> random errors in the event log. I thought it may be a virus, but a
>>>>>> check
>>>>>> revealed not. I did manage to use Exmerge to extract to PST the
>>>>>> users
>>>>>> mailboxes.
>>>>>>
>>>>>> The server continues to crash, sometimes not letting me log in. I
>>>>>> decided a
>>>>>> Dialtone recovery would be best. However I couldn't create a new
>>>>>> blank
>>>>>> database. Thanks to Rich Matheisen for pointing out that to create
>>>>>> the new
>>>>>> dialtone database I needed to move the log files to a new location,
>>>>>> essentially deleting them form the server. So now I've got a diatone
>>>>>> database and people can send and receive mail.
>>>>>>
>>>>>> Next up, use Exmerge to bring back the pst files from Saturday. Bad
>>>>>> move,
>>>>>> one of them must have contained the data that was corrupting the
>>>>>> database
>>>>>> and the server started crashing again with blue screen dumps. I have
>>>>>> to
>>>>>> admit I was losing what little hair I had left. For good measure the
>>>>>> IIS
>>>>>> file MetaBase.xml had become corrupt (presumably because of the
>>>>>> crashes)
>>>>>> causing Exchange services not to start correctly. This was restored
>>>>>> from a
>>>>>> system backup.
>>>>>>
>>>>>> The only solution was to disable the exchange services to get the
>>>>>> server
>>>>>> stable. Then create a new dialtone database (thus losing any mail
>>>>>> received
>>>>>> this morning, a backup couldn't be created because the server was
>>>>>> only up
>>>>>> for a couple of minutes at a time.) This has now allowed me to get
>>>>>> the
>>>>>> server running again. I'm going to run it for a day like this just
>>>>>> to
>>>>>> confirm it's stable.
>>>>>>
>>>>>> However I can't restore from the pst files on Saturday. So I'm going
>>>>>> to
>>>>>> have to go back to older backup files to get the users mail back.
>>>>>>
>>>>>> Andy C
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


.



Relevant Pages

  • Re: SMTPreg.vbs eventsink blokking vital system mail replication - Need help
    ... Ben Winzenz [Exchange MVP] skrev: ... I have a problem with an eventsink, I have three exchange 2003 servers, ... mail sendt from the inner server must be logged and blocked if not ...
    (microsoft.public.exchange.admin)
  • Re: Mailbox access across domains - ARGH!
    ... Ben Winzenz [Exchange MVP] wrote: ... logged in to abc.com trying to access a mailbox on an ... Exchange 2003 server, this server is in a domain xyz.com. ...
    (microsoft.public.exchange.admin)
  • Re: Mailbox access across domains - ARGH!
    ... Ben Winzenz [Exchange MVP] wrote: ... logged in to abc.com trying to access a mailbox on an ... Exchange 2003 server, this server is in a domain xyz.com. ...
    (microsoft.public.exchange.admin)
  • Re: May have messed up exchange move
    ... > pointing to the proper server under Exchange server. ... >> Ben Winzenz ... >> Exchange MVP ... >> MessageOne ...
    (microsoft.public.exchange.admin)
  • Re: Odd semi-crash or hang
    ... Then all's well with the server now? ... For temperature, if you're referring to the server's internal temperature, ... That'll restart most of the Exchange ... Restarting Microsoft Exchange System Attendant did not fix ...
    (microsoft.public.windows.server.sbs)

Loading