Re: Cancel IO problems on Server 2003

From: Eliyas Yakub [MSFT] (eliyasy_at_online.microsoft.com)
Date: 04/01/04


Date: Thu, 1 Apr 2004 06:40:26 -0800

Your crash is not related to using any old function. You just have some
classic bugs in your irp canceling/cleanup code path that manifest easily
when more than one thread interacts with your driver. Read Walter Oney's
book on IRP cancellation, understand all the gotchas and then review your
code. You will find the bugs. Few months ago we published a whitepaper on
IRP cancellation on http://www.microsoft.com/whdc/ site. I couldn't find a
link to the article.

More importantly run your driver under verifier.

http://www.microsoft.com/whdc/hwdev/driver/verifier.mspx

-- 
--
-Eliyas
This posting is provided "AS IS" with no warranties, and confers no rights.
"Ron Jolly" <ron.jolly@gefanuc.com> wrote in message
news:c76164d.0403310757.5cc52f60@posting.google.com...
> Hello group
> I have an old monolithic driver that works OK on NT 4.0 and Win 2000
> but has problems on Win Server 2003.  I have included a crash dump.
>
> Since the device must wait a long time for device interrupts to occur,
> the driver manages its own IRPs via standby queues.  An API library
> contained in a DLL is used by application programs to access the
> device. When a user app wants to wait fora device interrupt, he calls
> an API call that will Open the device, create an event, start a thread
> via _beginthreadex, instigate an WaitForMultipleObjects and wait for
> an event to notify the user app when an device interrupt occurs.
> This all works OK on NT 4.0, Win2k and also 2003 Server.
> The problem occurs on 2003 server if the user app tries to cancel the
> operation. A stress test I wrote, will attempt to cancel the operation
> very quickly after creating the API thread in the DLL. If I start a
> single instance of the app test (IE one process with one thread)
> everything works fine and will run forever. I can cancel the IO
> operaton all day long. If a second asynchronous intstance of the
> stress test is started while the first instance is still running, a
> crash will occur after about 5 minutes.
>
> As can be seen in the crash dump, the API library in the function
> RFM2gReceiveFromQueue, has called WaitForMultipleObjects. The
> WaitForMultipleObjects has returned with a and received a WAIT_OBJECT
> 0 + 1 (we received a cancel). We then call the CancelIo function
> from within the API DLL.
> At that point the system traverses a path to a crash. I notice that
> one of the final system calls is always IoAcquireCancelSpinLock.
>
> This is an old driver and I thought that maybe some of the old DDK
> APIs being used in the driver may be having problems on Server 2003. I
> went through the driver code and reviewed the  places where I am
> protecting queues with  my own spinlocks. I found all calls to
> KeAcquireSpinLock() and replaced them with
> KeAcquireInStackQueuedSpinLock and all of the other analogous calls
> such as replace KeAcquireSpinLockAtDpcLevel with
> KeAcquireInStackQueuedSpinLockAtDpcLevel, KeReleaseSpinLock with
> KeReleaseInStackQueuedSpinLock etc.
>
> This of course would be incompatible with NT 4.0, but I wanted to see
> if if fixed any problems on Server 2003. It did not fix the crash
> problem on Server 2003.
> I ran PREfast on the driver and no problems were found.
> The DDK used to build the driver is the is the 3790 version and the
> DLL and applicationis built with Visual Studio .Net 2003.
>
> I would like to take advantage of some of the newer DDK API's (like
> cancel safe queues), but it must still work with NT 4.0 so I don't
> think I am able to use these APIs.
>
> Are there any Hotfixes available on Windows Server 2003 that I may
> need?
>
> The system is a Gigabyte GA-8IPXDR motherbard using a
> Dual Xeon 2.8 Ghz with 1 gigabyte of main memory. Hyperthreading is
> Disabled in the BIOS.
>
> Begin crash dump
>
////////////////////////////////////////////////////////////////////////////
////
>
> Symbol search path is: C:\WINDOWS\Symbols
> Microsoft (R) Windows Debugger  Version 6.3.0011.2
> Copyright (c) Microsoft Corporation. All rights reserved.
>
> Loading Dump File [C:\WINDOWS\MEMORY.DMP]
> Kernel Complete Dump File: Full address space is available
>
> Symbol search path is: C:\WINDOWS\Symbols
> Executable search path is:
> Windows Server 2003 Kernel Version 3790 MP (2 procs) Free x86
> compatible
> Product: Server, suite: TerminalServer SingleUserTS
> Built by: 3790.srv03_rtm.030324-2048
> Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
> Debug session time: Tue Mar 30 15:39:57 2004
> System Uptime: 0 days 0:09:10.437
> Loading Kernel Symbols
>
............................................................................
....
> Loading unloaded module list
> ...
> Loading User Symbols
> .......
>
****************************************************************************
***
> *
>        *
> *                        Bugcheck Analysis
>        *
> *
>        *
>
****************************************************************************
***
>
> Use !analyze -v to get detailed debugging information.
>
> BugCheck A, {f772, 2, 1, 8074807e}
>
> *** WARNING: Unable to verify checksum for rfm2gdll_stdc.dll
> *** ERROR: Symbol file could not be found.  Defaulted to export
> symbols for MSVCR71D.dll -
> Probably caused by : ntkrnlmp.exe ( nt!KiTrap0E+224 )
>
> Followup: MachineOwner
> ---------
> 1: kd> !analyze -v
>
****************************************************************************
***
> *
>        *
> *                        Bugcheck Analysis
>        *
> *
>        *
>
****************************************************************************
***
>
> IRQL_NOT_LESS_OR_EQUAL (a)
> An attempt was made to access a pageable (or completely invalid)
> address at an
> interrupt request level (IRQL) that is too high.  This is usually
> caused by drivers using improper addresses.
> If a kernel debugger is available get the stack backtrace.
> Arguments:
> Arg1: 0000f772, memory referenced
> Arg2: 00000002, IRQL
> Arg3: 00000001, value 0 = read operation, 1 = write operation
> Arg4: 8074807e, address which referenced memory
>
> Debugging Details:
> ------------------
> WRITE_ADDRESS:  0000f772
>
> CURRENT_IRQL:  2
>
> FAULTING_IP:
> hal!KeAcquireQueuedSpinLockRaiseToSynch+3e
> 8074807e 8902             mov     [edx],eax
>
> DEFAULT_BUCKET_ID:  DRIVER_FAULT
>
> BUGCHECK_STR:  0xA
>
> LAST_CONTROL_TRANSFER:  from 804f4559 to 8074807e
>
> STACK_TEXT:
> b756fcf4 804f4559 80510f0e b756fd08 813e6ab8
> hal!KeAcquireQueuedSpinLockRaiseToSynch+0x3e
> b756fcf8 80510f0e b756fd08 813e6ab8 85905bc0
> nt!IoAcquireCancelSpinLock+0x9
> b756fd0c 805d8931 858a0658 b756fd64 0a4cfcf8 nt!IoCancelIrp+0x2d
> b756fd54 804dfd24 000007d0 0a4cfcf8 00000000 nt!NtCancelIoFile+0xb7
> b756fd54 7ffe0304 000007d0 0a4cfcf8 00000000 nt!KiSystemService+0xd0
> 0a4cfce4 77f4235b 77e6cd10 000007d0 0a4cfcf8
> SharedUserData!SystemCallStub+0x4
> 0a4cfce8 77e6cd10 000007d0 0a4cfcf8 000007d0 ntdll!ZwCancelIoFile+0xc
> 0a4cfd00 10003a65 000007d0 0a4cff80 00000000 kernel32!CancelIo+0x12
> 0a4cfe64 10002cab 00222538 00000001 0a4cff5c
> rfm2gdll_stdc!RFM2gReceiveFromQueue+0x475
>
> [c:\vsnet\rtbuild\rfm2gdll_stdc\rfm2gdll_stdc.c @ 2962]
> 0a4cff80 10203266 10009288 00000000 00000000
> rfm2gdll_stdc!RFM2gCallbackDispatcher+0x7b
>
> [c:\vsnet\rtbuild\rfm2gdll_stdc\rfm2gdll_stdc.c @ 2232]
> WARNING: Stack unwind information not available. Following frames may
> be wrong.
> 0a4cffb8 77e4a990 00225590 00000000 00000000
> MSVCR71D!beginthreadex+0x196
> 0a4cffec 00000000 102031b0 002254d8 00000000
> kernel32!BaseThreadStart+0x34
>
> FOLLOWUP_IP:
> nt!KiTrap0E+224
> 804e2f58 833d40a3578000   cmp     dword ptr [nt!KiFreezeFlag
> (8057a340)],0x0
>
> SYMBOL_STACK_INDEX:  1
>
> FOLLOWUP_NAME:  MachineOwner
>
> SYMBOL_NAME:  nt!KiTrap0E+224
>
> MODULE_NAME:  nt
>
> IMAGE_NAME:  ntkrnlmp.exe
>
> DEBUG_FLR_IMAGE_TIMESTAMP:  3e8015c6
>
> STACK_COMMAND:  .trap ffffffffb756fc80 ; kb
>
> BUCKET_ID:  0xA_W_nt!KiTrap0E+224
>
> Followup: MachineOwner
> ---------
> 1: kd> ln 804f4559
> (804f4550)   nt!IoAcquireCancelSpinLock+0x9   |  (8053aeb0)
> nt!IoAcquireVpbSpinLock
> 1: kd> ln 8074807e
> (80748040)   hal!KeAcquireQueuedSpinLockRaiseToSynch+0x3e   |
> (80748090)   hal
>
> KeReleaseInStackQueuedSpinLock
>
>
////////////////////////////////////////////////////////////////////////////
////
> End crash dump
>
> Any feedback welcome
>
> Thanks
> Ron Jolly
> VMIC - GE Embedded Systems


Relevant Pages

  • Cancel IO problems on Server 2003
    ... I have an old monolithic driver that works OK on NT 4.0 and Win 2000 ... I have included a crash dump. ... An API library ... This all works OK on NT 4.0, Win2k and also 2003 Server. ...
    (microsoft.public.development.device.drivers)
  • Re: word 2007 app crash
    ... they won't be updating the driver. ... recover your infomration and restart." ... 12 crash is listed for the word must close event. ... Fault Module Version: 61.63.249.0 ...
    (microsoft.public.word.application.errors)
  • Indiana truck driver charged with homicide in bus crash
    ... -- An Indiana semitrailer driver was charged Thursday ... by negligent operation of a vehicle charges, ... The crash occurred Oct. 16 after Kozlowski drove his semi onto the ... tavern until closing time early Oct. 15, ...
    (misc.transport.trucking)
  • Re: Bugcheck 101
    ... 3D images of real human beings, and the software allows you to cut through ... the driver guy. ... A> I wouldn't have bothered too much about this crash, ... However, the bugcheck ...
    (microsoft.public.development.device.drivers)
  • Re: word 2007 app crash
    ... they won't be updating the driver. ... 12 crash is listed for the word must close event. ... solution and restart program", the command is accepted and crash ... Fault Module Version: 61.63.249.0 ...
    (microsoft.public.word.application.errors)