Re: WaitForSingleObject() will not deadlock





On Thu, 05 Jul 2007 22:20:56 -0500, "Doug Harrison [MVP]" <dsh@xxxxxxxx> wrote:

On Thu, 05 Jul 2007 17:02:44 -0400, Joseph M. Newcomer
<newcomer@xxxxxxxxxxxx> wrote:

So the question is, what compromises are made to allow C to work in a multithreaded
environment? One is to hijack the semantics of volatile to disable compiler optimizations
that would not be thread-safe, and otherwise let the compiler to agressive optimization.

That's an incredibly undesirable approach for a number of reasons. AFAIK,
no one approaches the problem in that way.
*****
Did you see the C++ Standards Comittee discussion I pointed out? They are in fact
considering that. We were certainly doing this 35 years ago, so the technology is
well-understood. I worked with optimizing compilers between 1969 and 1983, and these were
issues we were constantly discussing and worrying about. We designed languages that had
explicit specifications of concurrency, and although the word 'volatile' was not one we
used, the issues raised by volatile seem indistinguishable from the issues we were
addressing.

Why is this undesirable?
*****

Another is to do conservative optimizations such that the issues of multithreading do not
become problems. What Microsoft has done is to do a very, very conservative optimization
strategy so that the code continues to work.

After thinking about it some, I think the compiler is a lot less
"conservative" than you've portrayed it to be. I've implemented dummy
lock/unlock functions for the example I posted earlier:

int x;

void lock()
{
}

void unlock()
{
}

bool f()
{
lock();
int x1 = x;
unlock();
lock();
int x2 = x;
unlock();
return x1 == x2;
}

void g()
{
lock();
++x;
unlock();
}

When I compile this in VC2005 with "cl -O2 -c -FAs a.cpp", the assembly
code I get for f is:

?f@@YA_NXZ PROC ; f, COMDAT

; 13 : lock();
; 14 : int x1 = x;
; 15 : unlock();
; 16 : lock();
; 17 : int x2 = x;
; 18 : unlock();
; 19 : return x1 == x2;

mov al, 1

; 20 : }

ret 0
****
This is due to the VS2005 compiler which is able to consider inlining code that it can
examine from the current compilation unit. This is not actually agressive optimization;
it is actually very conservative optimization that just has a lot of knowledge.

Agressive optimization deals with knowing the potential visibility of code that it
otherwise cannot examine. For example, that functions have no visibility to values other
than those passed in via their parameters. We had designed and implemented languages with
concepts like 'safe', and languages whose prototypes included specifications of
visibility, e.g., something like

int f(int h, int i) private;

where 'private' meant that it could only see state passed into it via parameters, and used
no external state it its computations, so calling it could not modify or depend on
external values. The point of this was so we could preserve optimizations around library
calls. For example,
extern x;
x = 1;
int n = f(1, 2);
x = 2;
int m = f(1, 2);
x = 3;
int k = f(x, 2);
would actually generate the code that looked like this
push 2,
push 1
call f
mov n,eax
call f
mov m,eax
pop
push 3
call f
mov k,eax

Then if we declared it as
int f(int x, int y) constant;
the code would have eliminated the second call to f, because this said it not only didn't
depend on external values, but it had no internal state, either. So fread() would have
been private but not const in this language.

When we finally did optimization, we were getting very small and fast code because the
compiler knew, from the prototypes, what functions could be optimized in creative ways,
without doing (as VS2005 does) actual inspection of the bodies of the code.

Note that you can force the calls to the functions by doing
__declspec(noinline)
to the declarations of the functions, but automatic inlining of functions in the same
compilation unit is not an agressive optimization, and the consequences of inlining a
function with an empty body only to discover there is no code to generate are not even
remotely agressive optimizations, they are actually rather trivial optimizations (we were
doing this in 1978, so it only took Microsoft about 25 years to catch up with the state of
the art [this assumes that VS2005 was started right after VS2003 was released]).

Agressive optimizations are the ones that work on the edge of the semantics of the
language, and which require careful interpretation of the language semantics to justify.
The people who do agressive optimizations are the ones that find the holes in the
specifications.

In the absence of prototypes that specify global side effects, for example, value
propagation of assignments to variables whose scope exceeds local scope are agressive.
Optimizations which do not assume that pointer arithmetic could affect a value are
agressive, but this is sufficiently interesting that most compilers have an option to
enable or disable "aliasing" optimizations (I was once nuked by an aliasing optimzation,
and we decided it was a compiler bug and had to remove the optimization entirely, because
that language had no constructs to either selectively disable optimizations or have the
equivalent of pragmas).
*****

Because the compiler can see into lock and unlock, it is able to reduce f()
to "return true", and that's fine. Indeed, I think it's a great
optimization, and I'd be a little disappointed if it didn't do it.
****
But VS2003 didn't do it, as far as I know. OTOH, I had a function that did byte swapping
using the inline bswap instruction, and VS6 would actually generate the code inline, but
VS2003, even with the __inline declaration, would not inline it and insisted on calling it
with a call instruction.
****

On the other hand, if lock and unlock resided in some opaque DLL, which the
compiler can't see into, I don't think it is "conservative" for the
compiler not to make assumptions about what those functions may or may not
do. Rather, it would be *incorrect* for the compiler to assume they don't
modify x when (a) it isn't explicitly told that and (b) can't prove it for
itself.
*****
Yes, it is conservative; in particular, if it is known to be in a DLL, it is unlikely it
can depend on any variable in my program (unless I pass a pointer into the DLL which it
holds onto, and that's an aliasing issue), and it also applies to external calls in
general, whehter in a DLL or not, which is why I call the compiler "conservative". It
should be possible to specify scope, e.g.,

#pragma private(lock, unlock)

would say that lock and unlock (using the keyword I used above) can affect only the values
pointed to by their parameters and have no other global scope, and while they can make
state changes in those variables they can make state changes nowhere else. So I can apply
this to any function which does not affect global variables of my program, whether it is
in a library of my own or a DLL or the kernel. Then all kinds of agressive optimizations
such as code motion, loop unrolling, value propagation, etc. are possible, because the
optimizations are known to be safe. Relying on the fact that an external call will always
trigger conservative optimizations assumes that the compiler and metalanguage will not
evolve to allow this. In that case, I would be perfectly correct in saying that lock and
unlock are private, and this means that it is completely permissible to change
x=1
lock();
x=2;
unlock();
x=3;

to be
lock();
unlock();
x=3

and the issue of the semantics of lock() having anything to do with memory visibility is
erroneous. In addition, whether lock does or does not do harware memory barriers becomes
irrelevant. What matters is not the semantics of lock, or the implementation of the
hardware, but the fact that x is a variable which has concurrent visibility, and the
presence of lock/unlock does not indicate that, nor should it have any affect. If x is
defined has having concurrent visibility, then the assigments MUST be performed, and in
fact I would argue that the memory barrier becomes part of the semantics of that store
operation, and therefore again is not related to what lock/unlock do. For exmaple, if I
had

__declspec(concurrent_access) int x;
then I would NOT have the option of the above optimization; instead I would have to
generate code of the form

mov x, 1
sfence
push mutex_address
call lock
mov x,2
sfence
push mutex_addres
call unlock
mov x,3
sfence

and so the semantics of lock/unlock are irrelevant, my declaration of concurrent access
requires the sfence instructions be generated. By lumping the language semantics,
compiler implementation, lock semantics, and harware implementation into one messy
specification that in fact is fundamentally inconsistent with the single-threaded language
semantics, I think the result is a useless specification, and one which overall makes no
sense.
*****


Finally, imagine the compiler *can* actually see into lock/unlock, wherever
they reside, and those functions are implemented as mutex lock/unlock
operations. Per the example I just presented, it would be able to tell they
don't modify x, and it could optimize f() as shown above. Clearly, that
would be wrong for MT programming. The absolutely wrong approach to solving
this problem would be to require the user to make x volatile; that was the
shortcoming of the ancient compilers you talked about earlier. The *right*
approach would be to provide some way to mark lock/unlock as having "memory
barrier" semantics. (Of course, if the compiler can see into those
functions and they actually contain memory barrier instructions, it should
be unnecessary to mark them as special, because the compiler should
understand what is implied by using memory barriers.)
****
Why? The user knows the values are used concurrently, so why should the compiler have to
infer the concurrent access, even if it were possible? Sounds like something along the
NP-hard dimension, actually. lock/unlock have nothing to do with memory barrier
semantics, and as I point out, memory barriers are irrelevant anyway given the semantics
of the language. The point about 'volatile' is that this puts the burden back on the
programmer, and the choices are ulimately
1. Generate marginal code but not violate concurrency issues
2. Let the programmer specify concurrency and be really agressive
the rest of the time, which is most of the time
3. Get it wrong most of the time when concurrency is involved

2 is a better compromise; you can always default the compiler to 1, and 3 is unacceptable.
The specs we;re arguing about assume that 1 is always true and that the only issue is
hardware implementaiton, and I think forcing the hardware synchronization into the locking
semantics is the wrong approach.
joe
*****
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: WaitForSingleObject() will not deadlock
    ... represent an incorrect implementation of the language. ... the *compiler* does not guarantee this. ... but to state it in terms of the execution instead of the formal semantics of the language ... as long as the optimizations do not change the semantics of the language). ...
    (microsoft.public.vc.mfc)
  • Re: Newbie question: accessing global variable on multiprocessor
    ... has guaranteed semantics. ... 'volatile' can also be used for other purposes (than ... If you know for a fact that compiler optimizations are your only ... you may know what optimizations the hardware and the ...
    (comp.programming.threads)
  • Re: optimized code
    ... > loop invariants are handled by the JIT not by the compiler fron-ends. ... generates the best optimized MSIL of any of the .NET languages. ... standard native code optimizations on MSIL code. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: c compilation - gcc vs visual c
    ... I recently compiled a numerically intensive c project under cygwin gcc ... MS focuses a lot more on specific optimizations, ... the simplest approach, however (and the one I currently use in my compiler), ... (silly code), are ones I focus on fixing. ...
    (comp.lang.c)
  • Re: Migrating ARM9E codebase to ARM11
    ... > optimizations done using hand-coding, only the control code is left to ... You should then consider upgrading to a newer compiler. ... wish to target might be faster than the assembler optimised for the 9E. ... I do not believe that the ARM ARM for architecture version 6 has been ...
    (comp.sys.arm)