Re: WaitForSingleObject() will not deadlock



See below...
On Fri, 06 Jul 2007 19:43:54 -0700, Frank Cusack <fcusack@xxxxxxxxxxx> wrote:

On Fri, 06 Jul 2007 21:47:51 -0400 Joseph M. Newcomer <newcomer@xxxxxxxxxxxx> wrote:
********************

It makes sense. Here's an illustration of what it's saying:

int x = 0; // global

***** Thread 1:

x = 2;
CreateThread;
// x = 3;

***** Thread 2:

if (x == 2)
puts("x == 2");

A. Rule (1) guarantees that when started, thread 2 observes x == 2, since
thread 1 set it to 2 before creating the second thread.
****
Yes, that would be true, if you don't have any other assignments
possible in any other thread. Essentially, there is an algorithmic
guarantee of data stability in such a case, and consequently, it
would not make sense for the thread to see a value OTHER than 2,
because the language semantics would insist that the assignment has
taken place at a time before launching the thread. Any hardware,
software, or compiler optimization that violated this would
represent an incorrect implementation of the language.
*****

No. Even though the assignment of value 2 (in thread 1) is guaranteed
to occur before the execution of "CreateThread" (this is the
compiler's guarantee), without the specific guarantee from the thread
specification that the new thread (thread 2) will see values written
before it was created, the *compiler* does not guarantee this. The
compiler only guarantees that *the thread which set x=2 and then
called "CreateThread"* will see x=2. For example, if CreateThread
modifies the value of x, the compiler guarantees that upon return from
CreateThread, thread 1 sees the new value of x. But just from the
compiler's guarantees, thread 2 is allowed to see the original value
of x=0, because the compiler does not guarantee when changes to data
are visible to other threads. This would not be broken and would
not be a violation, and in fact the compiler has no way to enforce
visibility.
*****
This is not an issue of "visibility". The address is always "visible". The compiler
cannot enforce anything, and the point is that stating that the values may not be seen
even if they are written before the thread starts is a misrepresentation of what is going
on. All that the specification needs to say is that there are no sequencing guarantees,
period. I'm saying that if the value is written before the thread starts, it is not the
domain of the compiler or run-time system to enforce anything, and what happens happens,
but to state it in terms of the execution instead of the formal semantics of the language
is the wrong approach. According to the language, if the assignment x=3 is made, it is
completed at the sequencing point that follows (volatile actually enforces this
limitation, allowing compiler writers certain degrees of freedom in doing optimizations,
as long as the optimizations do not change the semantics of the language). Note that
there is a difference between saying "a value is written", which means a store instruction
has been executed, and a specification which says "an assignment operation is performed",
which allows the compiler writer some wiggle room. If the value is truly written before
the thread starts, there is no way it can avoid seeing the new value. If the language
requires that the modification take place at the time of the assignment, then any
interpretation of the implementation which does *not* do this is erroneous.

Normally, we use an engineering compromise: for example, we define the behavior of the
compiler in terms of a single-threaded model, and then decide how far we are going to
allow the optimizer to distort the formal semantics so that appropriate postconditions are
maintained. The C language defines "sequencing points" at which certain postconditions of
the computation are guaranteed. What happens in between sequencing points is undefined.
However, semicolons are one of the sequencing points, so anything that happens at a
semicolon is supposed to be true, and if you decide that it has to be true in a
mulithreaded assumption, then you cannot allow an optimization that would violate this.
This would mean that all kinds of optimizations become illegal in a multithreaded
environment, because of the nature of the language. Then there is the view that says that
you should only do conservative optimizations so that code works as defined by the
language, or agressive optimizations so that under multithreaing assumptions the code
could be incorrect. Agressive optimization gives you overall better code, but it doesn't
help if that code is incorrect. For example

x = 2;
CreateThread()
x = 3;

has the characteristic that there are two sequential assignments of values to x, and no
intervening use of x. Therefore, the compiler should be free to migrate the assignment of
x=3 backward and eliminate x=2 entirely, or migrate the assignment x=3 forward and also
eliminate x=2 entirely. Since we are working with a single-thread assumption in the
compiler, these are both legitimate optimizations, and there is no reason to expect that
the thread launched by CreateThread would even see a *defined* value of x, let alone a
specific one. It hinges on the issue of what is meant by "write". For example, in
conservative optimization, one possible assumption is that any function call can modify
any variable at any time for any reason, and therefore the x=2 must be done before a
function call. An agressive optimizer would know that CreateThread is a kernel API that
does not and cannot use any global state in the program, and therefore could neither use
nor affect the value of x, no matter where x is stored, and consequently the variable x
lifetime does not intersect any lifetime of the function called. The purpose of
declarations like volatile was to allow compiler writers maximum flexibility in making
single-thread assumptions about execution while allowing some independent thread of
control (initially, a piece of hardware, which is certainly an independent thread of
control) to modify a value. Once we started muiltithreading, the semantics of volatile
apparently became confused, and remain confused (although the C++ committee is apparently
considering regularizing the semantics of volatile to deal with variables shared between
threads), but the ultimate consequence was that the Microsoft compilers are quite
conservative in their optimization compared to what we could do even 20 years ago, let
alone 35 years ago. Systems like OpenMP are designed to specify when there is concurrency
so the compiler knows the scope of certain optimizations (such as within a code block in
OpenMP, anything goes, because the lifetimes of variables is carefully controlled; if you
use a variable you have not specified, your program behavior is undefined).

But characterizing the behavior in terms of low-level concepts like "values written" is
not the right way to characterize value lifetimes. In a reasonable optimizing compiler,
the assigment x=2 would never take place; only if you cripple the optimizer would it occur
at all. Given the single-thread assumption, there is no way the compiler needs to
consider "enforcing" even the x=2 as being a reasonable thing to guarantee.. So how the
specifications can be written to require this escapes me entirely. Only if there is a
mechanism the programmer can invoke to indicate concurrency (perhaps "volatile", perhaps a
pragma) is it reasonable to assume that a compiler has any responsibility for enforcing
threading, since the language semantics simply don't address the problem at all. The fact
that current compilers can make anything work is due entirely to the bizarre hybrid of
trying to get a single-threaded language functional in a multithreading environment,
largely by crippling most code motion optimizations. This should not be necessary; it
should fall on the programmer to take responsibility for informing the compiler of
concurrency.
*****

But the posix threads guarantee requires that pthread_create()
execute a membar. *That* is what makes the value of x visible
*to the new thread*.
*****
No. As I indicated above, there is no reason that the compiler needs to enforce the idea
that the code for x=2 is even generated, so a membar is irrelevant to the discussion. A
memory bar deals with the low-level implementation of the hardware, and has nothing to do
with enforcing the high-level semantics of the language. Any optimizing compiler should
be able to deduce what the visibility of variables to library functions is, and it is up
to the programmer to specify if this assumption is not correct. In the case of creating a
thread, the thread creation function cannot see any variables, and we knew 30 years ago
how to determine this. The creating of a separate thread of control that CAN see the
values is not inferrable from the semantics of the function itself; in fact, it could be
creating a thread for which x is not used or perhaps even visible. Memory bars don't
affect any of this.
*****

B. However, if you were to uncomment the line that sets x to 3, thread 2
might not see that value, even if it were executed before thread 2 got
around to testing it. That is, thread 2 might still observe the value 2,
even though from thread 1's perspective, x contains 3, at the moment thread
2 tested it.
****

No, you would definitely see the value 3. The question is WHEN you
would see it. Because we no longer have a guarantee of data
stability, the sequence could be

x=0
x=2
createthread
x=3

and the thread has the sequence
?=x

Now the issue is nondeterministic. Once the thread is launched, it
MIGHT see the value 2, if the assignment x=3 has not been performed,
or it MIGHT see the value 3, if the assignment x=3 has been
performed.

No. Even if the assignment x=3 has been performed in thread 1 before
thread 2 starts running, it is NOT guaranteed to be visible to thread 2.
It is not the point at which thread 2 starts running at which data
is guaranteed to be visible, it is the point at which the thread was
created.
****
Why? There is no reason to assume that the assignment x=2 has been performed, because a
decent optimizing compiler might have eliminated it. And the issue also doesn't deal with
memory barriers, because of timing you cannot tell if x=3 has been done. Consider

x=2;
CreateThread(t1);
x=3;
CreateThread(t2)
x=4;
<end of timeslice on uniprocessor>

t1 starts running
what value does it see?

unknown, 2, 3, or 4? There is actually not even a guarantee that it sees 2,because a
memory barrier has no meaning if the code was never generated. But it is not guaranteed
that it sees 2, even if the code WAS generated, because the membar of CreateThread would
force x=3 to be visible, assuming code was generated for x=3. I'm disagreeing with the
language that talks about this in terms of values "written", because that implies that the
only issue is the low-level hardware. The only valid specificaiton would be talking in
terms of *assignment* of values in the language, for which membars are irrelevant. If,
however, to make the implementation consistent with the language, a membar is required,
that is a mere implementation detail, about as important as whether the AX or BX register
was used as an intermediate register. The specification has to be in terms of the
language, and the language has to guarantee that the code is generated. The rest, such as
membar issues, is mere implementation detail.
*****

This rule is meant to emphasize non-deterministic thread scheduling,
and to make clear the point at which a membar (associated with
pthread_create()) must be executed.
*****
This still confuses platform implementation with language semantics and compiler
implementation, which are two separate issues.
*****

If you had two threads t1 and t2 and they executed like this:

t1 t2
x=1
x=2
y=x

you are already aware that y in t2 might not be the value 2. This
is NOT because of ordering of instructions in time (note: assignment
y=x is AFTER x=2), it is because there is no guarantee of memory
visibility across threads without explicit synchronization.
****
Explicit synchronization of what with who? For synchronization to make sense, there has
to be a determination of what is being synchronized. The issue of whether x=1 is even
performed is an issue. For atomic assignments, synchronization is not required. If the
assigment x=2 is performed as shown, y=x should see 2, simply because if we are talking
about the actual assignment, the language requires that whatever memory location holds x
must be modified at the sequencing point, and a memory barrier is not part of this
discussion. If you want to introduce memory barries, they are not correlated with
synchronization at all; they are merely memory barriers, and should have nothing to do
with mutexes, conditions, etc. Coupling these two only leads to confusion. I think the
whole point here is that there are several orthogonal concepts which have all become
deepl;y intertwined into a confusing mess:
1. The formal semantics of the language
2. Optimizing compilers
3. Hardware-specific idiosyncasies
In the formal semantics of the language, there is only a single-thread model at the moment
for C. So there is no reason to expect that ANY assignment is performed at the apparent
point of execution. All that is required is that under a single-thread assumption, the
computation expressed by the program is performed. This leads to the issue of
optimization, where the compiler is free to reorder the code in any way it sees fit as
long as the formal (single-threaded) semantics are obeyed. Thus a construct like
x=3;
lock();
x=4;
unlock();
x=5;
f(x)

would mean that the code would be, from a single-thread assumption.
lock();
unlock();
f(5);
and a good compiler will generate that if it is doing agressive optimizations.

As volatile was first specified, it required that at sequencing points, volatile values
that had changed *must* be stored, so the above optimizations would be forbidden if the
declaration was 'volatile x;' but are NOT forbidden otherwise. So there is no reason to
expect that 'lock' and 'unlock' operations, no matter what they are called, have any
effect on the optimization, especially if the compiler understands what values are visible
to those library functions (such as, only their parameters).

However, if you turn optimizations off, then the code just happens to generate the
instructions to do all the redundant assignments. Note that adding or removing memory
barriers has no effect on the highly-optimized code because there is no actual pending
assignments to optimize.

As long as we try to keep combining multithreaded logic in a single-threaded language,
there are going to be these confusing issues, but my point is that the specifications as
written do not address any of the real issues; they talk about values being "written", and
that's all. So there are some serious assumptions about what optimizations are and are
not permitted, which is not a specification handled by the language, only by the compiler
implementation, and a conforming compiler is not required to do other than meet
single-thread semantics.
******
****
Now
if you have

t1 t2
DNE
x=1
create
created (x=1 visible)
x=2 starts running
y=x

this adds the thread creation into the mix, however you can see that
this is just like the previous case, and y=2 is not guaranteed.
****
But I don't understand why x is even guaranteed to be 1! The specification says it WANTS
x=1, but if it c;laims to be platform-independent then it has to be
compiler-implementation-independent, and any conforming compiler is free to eliminate x=1.
*****

1. Whatever memory values a thread can see when it creates a
new thread can also be seen by the new thread once it
starts. Any data written to memory after the new thread is
created may not necessarily be seen by the new thread, even if
the write occurs before the thread starts.

*****
No, that is what is confusing. If the write takes place, there is
no reason the new thread will not see it, because the write is
taking place to the location where the variable is found. The issue
is merely the relative timings of when the write takes place. The

No. The new thread might not see it because without explicit
synchronization, data is not guaranteed to be visible across threads,
regardless of time ordering.
*****
Again, this confuses compiler implementation, language semantics, and hardware
implementation in a way that makes no sense. I could write a conforming compiler that
violated this, and all the membars and syncrhonization inthe known universe would not
solve the problem, and I could still argue that my compiler was absolutely correct and
conforming to the specifications of the language. The only way out of this is to modify
the language to support concurrency semantics. 'volatile' or some suitable set of pragmas
might work, but as the language is currently specified, the "rules" are essentially
nonsensical. They assume that "synchronization" (which is unspecified) is going to change
the language semantics, and I see no reason that should be so.
*****

suggestion that the write would not be visible if it happened before
the thread starts seems odd, because that would suggest that the
thread gets a local copy of the variable, and that the
implementation reserves the right to make the copy early (at thread
creation) or late (at thread startup). But why would a threaded

It doesn't suggest copies at all. It is about memory visibility.
*****
And this is my point. As I keep saying, there are language semantics, compiler
implementation, and hardware implementation, and given the language semantics are already
undefined in a multithreaded environment, worrying about hardware implementation is not
going to solve the problem.
*****

implementation feel any need to worry about the time a copy is made
when the semantics of the language do not require that a copy be
made at all, and for that matter, would actually "teach against"
this practice?

A memory address is a memory address. Data written to that address
appears in memory. Caches are mechanisms to optimize performance
while giving the illusion of those semantics being maintained.

x=3 does not mean data is written to memory.
*****
Depends on your interpretation of language semantics. That's my point. x=3 doesn't even
mean that there is code GENERATED to do the store, so worrying about membars seems a
pointless exercise.
joe
*****

-frank
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: WaitForSingleObject() will not deadlock
    ... One is to hijack the semantics of volatile to disable compiler optimizations ... and otherwise let the compiler to agressive optimization. ... Agressive optimizations are the ones that work on the edge of the semantics of the ... Because the compiler can see into lock and unlock, it is able to reduce f ...
    (microsoft.public.vc.mfc)
  • Re: User-defined type attributes (replacing genericity)
    ... What's wrong with having control over the semantics? ... (Control over the semantics of the language. ... But the compiler is in an even worse position. ... Sorry, but it is generics which constitute a meta language, so this ...
    (comp.lang.ada)
  • Re: Beyond CL?
    ... around in the internals of the language. ... as the programmable programming language; ... Eventually a bank of many users' optimizations could be ... approach the fabled "sufficiently smart compiler". ...
    (comp.lang.lisp)
  • Re: RAD vs. performance
    ... wrote in the dynamically typed language is wrong. ... to start executing compiler optimisations by hand. ... same optimizations. ...
    (comp.lang.misc)
  • Re: Teaching new tricks to an old dog (C++ -->Ada)
    ... Because the run-time check code required by the language will require ... > Because the compiler checks that there is static matching ... of a feature rich language like Ada, ... into the optimizations. ...
    (comp.lang.ada)