Re: IOCP critical sections and mutexes

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Hi mate

It is well-known fact that some otherwise invisible bugs reveal
themselves only under the stress. Once you say that , your application
works perfectly well even under the stress if you use mutex instead
of critical section, your application in itself is beyond any suspicion
- the problem must lie with the way critical sections are implemented.

Apparently, you have accidentally discovered some hidden bug in
ntdll.dll that reveals itself only under the stress. Once events,
mutexes and semaphores are the system dispatcher objects, they are
dealt with by the kernel, so it would be easy to detect
bugs in their implementation if they had any - the standard response
to any kernel-mode bug is BSOD. However, unlike events, mutexes and
semaphores, critical sections are implemented by the user-mode
ntdll.dll, so that the only thing that they can crash is some
user-mode program. It is understandable that if you program crashes,
first of all you look for bugs in the program itself, rather than start
suspecting the system services . Therefore, if critical section's
implementation has some not-so-easily-detectable bugs, there is a good
chance that,up to this point, they have been just overlooked and/or
attributed to programs that crash as a result of these bugs

Anton Bassov



Alberto Demichelis wrote:
Hi all, I'm writing a server app based on a UDP protocol. I have to handle a
large number of packets
per second so I'm using IOCP for handling my socket. Until this week, I
thought I'd nailed my IOCP
routines but I recently started some preliminary stress test on a new server
machine and I'm getting
a strange crash that is driving me mad.

My scenario is simple, I have a single UDP socket, I spawn 2 threads per
processor and I have a maximum
concurrency in my CP set to 'number of processors'. I cap a max 2 read
requests per thread and max 5 write
requests queued per thread. My packets never go over 2k in size.
Each IO thread does very little work, just checks if the UDP packet is
actually a valid frame for my app
then pushes it in a queue that will be later processed by my main app
thread(is a realtime simulation).
The queue access is sycronized using a critical section(I'm also testing
with a mutex)

My crash happens when I push a packet in the queue that goes to the app. The
thing that drives me crazy is that
only happens on my dual Xeon (with hyperthreading Win Server 2003) and only
if I use critical sections as sycronization.
If I use a mutex instead it works fine.

My test is very simple. I generate about 10Mb of traffic that will be sent
to the app. Eventually on my dual xeon
I'll get an exception that looks like a pure virtual function call. I
reproduced the crash many time and explored
the stack and everything looks just fine, except that my stack seems to be 4
bytes shifted(but I assume is just VS's Debugger that
gets a bit confused with release builds). The problem seems to be around the
critical section.
The critical section version crashes almost immediately, the mutex version
has been running for hours onder heavy stress and is flawless.

So far I always tested it on my dev machine (dualcore pentium Win XP pro)
and even if I generate a lot of traffic the
app just works fine both with critical section or mutex. On single core
machines obviuosly works well too.
If I use single threaded network IO everthing works fine so I'd exclude a
bug in the app itself.

Does anybody have any idea of what could cause this? is there something that
I shouldn't do with critical sections?
I'm really curious because I really don't understand the mutex as a solution.

thanks for your time
Alberto

.


Quantcast