Re: Problem isolating blue screen of death problem

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Gerry,

Thanks again for taking the time.

> A copy of the Stop Error report is needed if you want targetted help.
I'm not sure what 'the Stop Error report' is. If it's the details from
the kernel dump, here it is:
=======================================================
Loading Dump File [C:\Bioptigen\Kernel Dumps\MEMORY122208A.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is:
SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows XP Kernel Version 2600 (Service Pack 2) MP (4 procs) Free x86
compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 2600.xpsp.051011-1528
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055c700
Debug session time: Mon Dec 22 15:08:06.625 2008 (GMT-5)
System Uptime: 0 days 0:45:25.579
Loading Kernel Symbols
....................................................................................................................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 7ffd900c). Type ".hh dbgerr001" for details
Loading unloaded module list
............
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck A, {c0605000, 2, 1, 805043d1}

Probably caused by : memory_corruption ( nt!MiAddWorkingSetPage+cf )

Followup: MachineOwner
---------

1: kd> !analyze -v
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: c0605000, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on
chips which support this level of status)
Arg4: 805043d1, address which referenced memory

Debugging Details:
------------------

WRITE_ADDRESS: c0605000

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiAddWorkingSetPage+cf
805043d1 c70680000000 mov dword ptr [esi],80h

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

PROCESS_NAME: MemTest.exe

TRAP_FRAME: b404ac44 -- (.trap 0xffffffffb404ac44)
ErrCode = 00000002
eax=0007a4cf ebx=0007a4cf ecx=00000041 edx=89714902 esi=c0605000 edi=c0883000
eip=805043d1 esp=b404acb8 ebp=b404acdc iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
nt!MiAddWorkingSetPage+0xcf:
805043d1 c70680000000 mov dword ptr [esi],80h
ds:0023:c0605000=????????
Resetting default scope

LAST_CONTROL_TRANSFER: from 805043d1 to 805437d0

STACK_TEXT:
b404ac44 805043d1 badb0d00 89714902 81de66a4 nt!KiTrap0E+0x238
b404acdc 805051fb 89d56588 89d56588 c03f7620 nt!MiAddWorkingSetPage+0xcf
b404acf4 8051fbc1 c0883cfc 7eec4000 0012e81c nt!MiLocateAndReserveWsle+0xc1
b404ad4c 80543668 81de6d88 7eec4000 80000000 nt!MmAccessFault+0xfb5
b404ad4c 004020a1 81de6d88 7eec4000 80000000 nt!KiTrap0E+0xd0
WARNING: Frame IP not in any known module. Following frames may be wrong.
000000a8 00000000 00000000 00000000 00000000 0x4020a1


STACK_COMMAND: kb

FOLLOWUP_IP:
nt!MiAddWorkingSetPage+cf
805043d1 c70680000000 mov dword ptr [esi],80h

SYMBOL_STACK_INDEX: 1

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP: 434c50c7

SYMBOL_NAME: nt!MiAddWorkingSetPage+cf

IMAGE_NAME: memory_corruption

FAILURE_BUCKET_ID: 0xA_W_nt!MiAddWorkingSetPage+cf

BUCKET_ID: 0xA_W_nt!MiAddWorkingSetPage+cf

Followup: MachineOwner
=======================================================
The Bug Check code, and Args 1, 2, and 3 never vary; that is: always
IRQL_NOT_LESS_OR_EQUAL, always to address 0xC0605000, always at IRQL level 2,
always a write operation. For the first several weeks of testing, Arg4 never
varied either (0x805043d1); in the last couple of days, we've seen other
addresses there. A few days ago, I removed some drivers of which we were
suspicious (the National Instruments drivers we use to acquire images) and
some we weren't using (Roxio DVD burning). My speculation is that the address
change is related to that, that I changed the driver load order in some way.

>Disable automatic restart on system failure
We have done so--there's plenty of time to look at that screen.

>The inference from what you have written is that the Errors are not
occuring during the boot process. Is this correct?
Yes, that is correct.

>This means that you can start to eliminate things that load when you boot.
Cam I ask you to elaborate on this a little? Do you mean, use MSConfig
and disable stuff on the Startup and Services tab?

>Have you tried to reproduce the error in safe mode?
No. It has been on our list of things to try, but we hadn't gotten to it
yet. I will investigate.

>Are there any yellow question marks in Device Manager?
No.

>What errors are appearing in Event Viewer?
There are Warning level messages that appear to be related temporally.
Each time a crash occurs, we see15-25 instances (it varies) of a message "An
error was detected on device\HardDisk0\D during a paging operation." (There
is no page file on drive D:, although we are running on that disk.) The
messages, in fact, reinforce our working hypothesis: that a driver
inappropriately raised the IRQL level, and that a paging operation happened
to occur at the right time.
There are no other Warnings or Errors in the System area.
There are a few error messages in the Application area, but none that
occur regularly--they appear to be side-effects of the fact that the OS is
crashing. Things like explorer.exe or spoolsvc.exe faulting, and we might
have 1 or 2 of each, spread over the time period when we've seen 40-50
crashes.

>>A tip for posting copies of Error Reports...
The system on which we are crashing does not have network access--we're
trying to emulate our field conditions, and our product is a medical device
which would be operated in this way. If you think it is critical, I can get a
copy of one of these Error messages to you using a thumb drive, but I think I
have copied it faithfully.

Again, thank you for your efforts

PC
.



Relevant Pages

  • Re: [2.6.9-rc2-bk] Freeze during boot
    ... Something is broken in general with the networking drivers. ... > When I boot yesterday's kernel I get error messages saying ... > that the kernel modules can't be loaded because ...
    (Linux-Kernel)
  • Re: Freebsd vs. linux
    ... The OS itself never identifies problems as being within the drivers. ... Driver code is assimilated with the kernel while it is running. ... Most error messages are generic; ... > with an operating system, they are part of the operating system even ...
    (freebsd-questions)
  • Re: ACPI Error under 2.6.26-rc*
    ... and attach the rsdp for all of the three cases (good, ACPI Error, ... could you please attach the dmesg output of a 2.6.25.10 kernel which has ... # IPVS transport protocol load balancing support ... # Device Drivers ...
    (Linux-Kernel)
  • init_emergency_isa_pool calling mempool_create in non-sleeping context
    ... I just saw this when booting a current linux-2.6.git kernel. ... # Power management and ACPI options ... # Bus options (PCI etc.) ... # Enable WiMAX to see the WiMAX drivers ...
    (Linux-Kernel)
  • Re: 2.6.29+ NFS-Server Problem "reconnect_path: npd != pd"
    ... This is an x86_64 kernel. ... # Device Drivers ... # SCSI device support ...
    (Linux-Kernel)