Re: Multiprocessor crash.
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Sat, 01 Mar 2008 19:03:27 -0500
A function should be inherently reentrant; you actually will normally have to go to a fair
amount of conscious effort to program badly enough to create a non-reentrant function. Do
not confuse reentrancy with concurrency problems. A function can be perfectly fine as far
as reentrancy but it can be using data which is shared with another thread and if used
without proper synchronization this code will cause problems. Key here is to, insofar as
possible, create data structures which do not require concurrent access. See my essay
"the best synchronization is no synchronization" that suggests that certain design
decisions can result in a correct program where explicit synchronization using mutexes or
CRITICAL_SECTIONs is not required.
Note that unless you need specific features of a mutex, such as cross-process
synchronization, timeouts, or the ability to wait for multiple synchronization events, a
CRITICAL_SECTION is a better choice for a synchronization primitive.
Setting thread affinity is a way of disguising bad code so that it gives the illusion of
working. If you require thread affinity to achieve correctness, your code is
intrinsically flawed, and you have to fix it. It almost always means you have a
synchronization failure.
Never lock code against concurrent access; that is always a bad design decision. You must
lock data against concurrent access, if you are doing locking at all. A common design
error made by beginners is to lock code. This results in what can be profoundly poor
performance in large multiprocessor systems. and merely suboptimal performance in smaller
configurations.
All the inability to reproduce the problem means is that you have made the probability low
enough that you now can't see the error. It is still there, and your code is still
incorrect, you just haven't noticed it yet. You have to find and fix the bug.
joe
On Fri, 29 Feb 2008 16:08:12 -0500, rnd <rnd@xxxxxxxxxxxxxxxx> wrote:
Hi,Joseph M. Newcomer [MVP]
thanks everyone for the answers! I think that one of the issue is the
procedure re-entry problem. I created a stress test that is basically
increasing in a significant manner the probability of re-entry.
Also, one of my colleague pointed out that we could set the affinity in
the Windows Task Manager. So I basically found how to do it
programatically using the SetProcessAffinityMask method.
When I set to use only one core, I'm unable to reproduce the problem.
Since I'm able to reproduce it in house, I'll generate the dump file
needed to go deeper in the investigation. When I'll have a crash with
the stack traces, I'll post them into this thread.
Thanks again and have a great weekend!
Pete
Jeffrey Tan[MSFT] wrote:
Hi Pete,
Thanks for your feedback.
Yes, the multithreading issue may be quite timeline based and hard to
reproduce in a test. That's why design is the the most essential part of
the multithreading application. Once the problem only occurs in production
environment, it is pretty hard to find out.
Given that this problem may be caused by context switch or procedure
re-entry, there must be some synchronization or mutex problem in the code
that causes the crash. That's why I believe we should first get the stack
trace of the crash and why the crash occurs. Once it is caused by certain
state corruption, we should employ some synchronization mechanism to fix
it.
Anyway, please feel free to feedback more information. Thanks.
Best regards,
Jeffrey Tan
Microsoft Online Community Support
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Prev by Date: Re: Multiprocessor crash.
- Next by Date: Re: Multiprocessor crash.
- Previous by thread: Re: Multiprocessor crash.
- Next by thread: Re: Multiprocessor crash.
- Index(es):
Relevant Pages
|