Re: WaitForSingleObject() will not deadlock



But see below...
On Wed, 04 Jul 2007 23:44:02 -0500, "Doug Harrison [MVP]" <dsh@xxxxxxxx> wrote:

On Wed, 04 Jul 2007 23:33:41 -0400, Joseph M. Newcomer
<newcomer@xxxxxxxxxxxx> wrote:

"Programming with POSIX Threads", page 89
<q>
1. Whatever memory values a thread can see when it creates a new thread can
also be seen by the new thread once it starts. Any data written to memory
after the new thread is created may not necessarily be seen by the new
thread, even if the write occurs before the thread starts.
*****
Seems an odd specification. Essentially, it means that threads cannot share variables
when they are running.
*****

It makes sense. Here's an illustration of what it's saying:

int x = 0; // global

***** Thread 1:

x = 2;
CreateThread;
// x = 3;

***** Thread 2:

if (x == 2)
puts("x == 2");

Rule (1) guarantees that when started, thread 2 observes x == 2, since
thread 1 set it to 2 before creating the second thread. However, if you
were to uncomment the line that sets x to 3, thread 2 might not see that
value, even if it were executed before thread 2 got around to testing it.
That is, thread 2 might still observe the value 2, even though from thread
1's perspective, x contains 3, at the moment thread 2 tested it.
****
Since this code is erroneous, there seems to be little point to worrying about whether
thread 2 sees x==2 or x==3. There is a race condition here, and there is no way the value
of x is guaranteed. This has nothing to do with memory cache semantics, or cache
coherency; it has to do with a program that is executing concurrently and
nondeterministically modifying a value in a way that is timing-sensitive and can produce
either of two possible answers in this case. At the time the thread runs, EITHER x==2 or
x==3, and there is absolutely no reason to expect it could have either value. In this
case, mutexes will not solve the problem because a mutex lock around the assignment does
not change the fundamental nondeterminism of the computation, and therefore locking is
irrelevant to the discussion.
****

The importance of (1) should be apparent when you imagine x is a "this"
pointer, a CString pointer, etc that you are passing to the thread through
the CreateThread LPVOID parameter. You certainly expect the new thread to
observe the same data (what's pointed to) the existing thread observed when
it called CreateThread.
*****
Unless the calling thread changes it, e.g,

CString s = _T("Hello");
CreateThread(MyThread, &s);
s = _T("World");

UINT MyThread(LPVOID p)
{
CString * s = (CString *)p;
display_the_string_somehow(*s);
return 0;
}

then the thread MyThread might show one of the following results:

"Hello"
"World"
"ííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííí"
"&*%$aX#Pq" <or something equally completely meaningless>
<takes an access fault>

No amount of synchronization is going to change the fact that this program is erroneous.
No rule about "memory visibility" is going to change the fact that this program is
erroneous.
No rules about cache coherency are going to change the fact that this program is erroneous
*****


2. Whatever memory values a thread can see when it unlocks a mutex (leaves
a synchronized method or block in Java), either directly or by waiting on a
condition variable (calling wait in Java), can also be seen by any thread
that later locks the same mutex. Again, data written after the mutex is
unlocked may not necessarily be seen by the thread that locks the mutex,
even if the write occurs before the lock.
*****
I find this truly unbelievable. How can a mutex know what values were accessed during the
thread, so that it can ensure the values are going to be consistent if that same mutex is
locked? This strikes me as requiring immensley complicated bookkeeping on the part of the
mutex implementation. It seems so much easier to follow the semantics of most hardware
and just make sure that locking guarantees all pipes and caches are coherent across all
processors.

It makes sense. It's a high-level, abstract description of something you're
thinking about at the hardware level and taking very literally. How else
would you describe this in terms of mutexes? You wouldn't expect things to
work right if you used two mutexes pell-mell to protect the same piece of
data. No, you have to lock the same mutex, and (2) reflects that.

(The stuff about Java doesn't appear in the book. I guess the guy who wrote
the message I copied the excerpt from added it.)

While I first find it hard to imagine how it is possible to create a
situation of this nature on any closely-coupled MIMD architecture, I can imagine how it
can exist in a distributed MIMD architecture, but in that case, the mutice have to keep
track of every variable that is accessed within their scope, and ensure that an attempt to
lock a mutex can ensure that all remotely-cached data is sent back and all locally-cached
data is distributed out. An awesome task.

Of course that's not what it's saying. The requirement (2) can be fulfilled
in much simpler though coarser ways than you're thinking. Also keep in mind
that Butenhof was an architect of pthreads, a high-level, portable,
standardized multithreading library designed to run efficiently on existing
hardware, and he's not going to write about impossible requirements.
*****
Then why doesn't he write what he means, instead of writing something so clumsy that it is
subject to misinterpretation
*****

3. Whatever memory values a thread can see when it terminates, either by
cancellation, returning from its run method, or exiting, can also be seen
by the thread that joins with the terminated thread by calling join on that
thread. And, of course, data written after the thread terminates may not
necessarily be seen by the thread that joins, even if the write occurs
before the join.
*****
This implies that the notion of "join" exists as a fundamental concept. Most operating
systems that have threads do not seem to have this concept any longer; it seems to have
been more of a high-level concept when threads were implemented above the operating
system.

I wonder what bizarre era of programming this specification represents.

I don't know what you mean. Windows has WaitForSingleObject(hThread) which
is "join" by another name, and "join" exists by that name in .NET. I would
consider any multithreading model that doesn't provide a "join" facility as
fundamentally broken. To understand what (3) is saying, you can sort of
reverse my example for (1). I've talked many times about another important
use for "join", which is to "collect" all secondary threads, which is often
necessary to perform an orderly program shutdown.
*****
No, WaitForSingleObject is NOT a 'join'. A 'join' is where thread executions converge and
the joined threads terminate. We spent a lot of time on fork/join constructs back in the
1970s, and the Unix 'fork' was an outgrowth of this kind of thinking (but there's no
'join'). At a WFSO/WFMO, there is no loss of the identity of the threads; they keep
running.

There is a concept akin to fork/join in OpenMP, and it implements what we usually meant by
fork/join: at the join point, all working threads essentially terminate and therefore the
join point is a convergence of concurrent computations. After the join point, all
concurrency has stopped. WF[SM]O does not have these semantics.
*****

No surprises there. The MSDN link indicates this applies to other sync
objects plus CRITICAL_SECTION. It's nice for it to be stated, but for all
the reasons we've talked about, it couldn't be any other way. For Windows,
I expect you could add:

5. Whatever memory values a thread can see when it calls PostMessage or
SendMessage can also be seen by the thread that retrieves or processes the
message.
*****
Essentially, in Windows, any value in an memory location that can be seen by one thread
can be seen by any other thread at any time for any reason.

But absent synchronization of some form, whether or not two threads observe
the same value depends on the hardware. I'm postulating this is done in
rule (5); if it weren't, object hand-off protocols wouldn't work on
hardware such as IA64.
*****
Synchronization is not required for this; cache coherency in the hardware IS required.
*****

So the only issue is whether
or not the COMPILER has done something that keeps some local cache somewhere.

That's a separate issue.

The conservative optimizer of Microsoft C ensures this by essentially implementing "volatile"
semantics against any variable that is potentially modifiable by an arbitrary function
call. A C compiler is not required to do this, and it can still be a conforming C
compiler.

Don't mix this up with "volatile". To revisit my example from a couple of
messages ago, if a C compiler could look into the mutex lock/unlock
operations and see that they don't modify a global variable that is
intended to be protected by the critical section, allowing it to perform
optimizations that are at odds with the attempt at multithreaded
programming, yes, it would be a "conforming C compiler". So what? The C
Standard doesn't address multithreading, and such a compiler would be
useless for multithreaded programming. It would be the reincarnation of
ancient compilers unsuitable for multithreaded programming you described a
couple of messages ago.
*****
Since the standard does not address multithreading, a compiler that performed
optimizations that worked correctly in single-threading would indeed be a conforming
compiler, even if they produced incorrect code in a multithreaded environment. WHether or
not the compiler would be useful in a multithreading environment would be a pointless
discussion, because it would still be a correct compiler according to the standard. So
the question is, what compromises are made to allow C to work in a multithreaded
environment? One is to hijack the semantics of volatile to disable compiler optimizations
that would not be thread-safe, and otherwise let the compiler to agressive optimization.
Another is to do conservative optimizations such that the issues of multithreading do not
become problems. What Microsoft has done is to do a very, very conservative optimization
strategy so that the code continues to work.
*****
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: WaitForSingleObject() will not deadlock
    ... Whatever memory values a thread can see when it creates a new thread can ... data written after the mutex is ... standardized multithreading library designed to run efficiently on existing ... A C compiler is not required to do this, and it can still be a conforming C ...
    (microsoft.public.vc.mfc)
  • Re: How to properly initialize shared memory on Unix?
    ... As David said, mutex (and ... most of other synchronization primitives) enforce memory barrier, ... But it does not necessary enforce the compiler ...
    (comp.programming.threads)
  • Re: how to ensure other threads load new value from memory
    ... If mutex is implemented is such a way that compiler can prove that it ... "memory", which is a named invalidation (it says 'assume memory may ... the C programming standard is not sufficient to ...
    (comp.programming.threads)
  • Re: WWDC -- MacBook Pro?
    ... ....but once the program has loaded it into its program memory, ... boundaries, int16 are aligned on 2 byte boundaries, int32 are ... people are always able to come up with the compiler ... Case B, same glass, same ice code water.  ...
    (comp.sys.mac.system)
  • Re: Javas performance far better that optimized C++
    ... The compiler is extremely stupid. ... no memory leaks are guaranteed. ... However I have GC in my .NET programming. ... "C.9.1 Automatic Garbage Collection ...
    (comp.lang.cpp)