Re: synchronization demo

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



On Thu, 03 Sep 2009 15:33:02 -0700, not_a_commie <notacommie@xxxxxxxxx> wrote:

I put together the code below for some training at work. It contains
numerous methods for doing thread synchronization. Compile it in
release mode to see the various speeds. Change the functions in the
two "for" loops at the bottom (but keep them both the same function).
You'll need a multi-core CPU to see any problem with the non-protected
increment as the CLR is not dumb enough to break up an increment
function when scheduling threads on the same core.

It's not really the CLR so much as the platform itself. I haven't looked, but I suspect the "_count++" gets JITted to an "inc" instruction. And I'm pretty sure a context-switch can't happen mid-instruction for a thread, even if the Windows scheduler is doing the switching. So with one core, you can't switch from one thread to another until the increment has finished.

I never did figure
out how the Monitor class itself does locking. It may be using some
kind of dictionary / data structure that doesn't require locking for
the TryGetValue call.

I forget the details, but I'm sure I've seen them before. So if you're curious, Google should be able to find it for you. Monitor is reasonably efficient, but it requires some kind of synchronization/locking, just as any thread-synchronization mechanism would.

Have any of you seen a data collection structure
with a "TryGetValueOrAddNew" method that is natively thread safe?

Define "natively".

As far as your examples, I've got some comments:

-- You go to some lengths to describe various mechanisms as "slow". But there's not any quantification of that, and the fact is for many situations "slow" is something that just wouldn't ever be an issue. The comment about ManualResetEvent being "system-wide" and thus "slow" has two problems: "system-wide" doesn't in and of itself necessarily imply "slow", and since all ManualResetEvents are anonymous, I doubt they are "system-wide" anyway (you need to use EventWaitHandle to get a named event).

For sure, using phrases like "incredibly slow" is strikingly misleading, IMHO. And it seems to me that in many of the examples you've provided, there are performance problems introduced simply via the implementation, rather than because of the use of any specific .NET types, making any performance comparison between different approaches invalid.

-- Generally, where you use ManualResetEvent instances, you should be using the Monitor class notification methods instead: Wait(), Pulse(), etc.

-- Given that you're not doing real spin-waits in the so-called "spin" implementations, you ought to be using Thread.Sleep(1) to ensure an unconditional yield, rather than Thread.Sleep(0) (which will yield only to threads of the same priority or higher).

-- Note that .NET 4 will include "spin"-ning lock implementations; you might want to consider targeting .NET 4 for your demonstration.

-- I don't get the point of some of the implementations, in that they add nothing semantically, and essentially delegate (sometimes with added inefficiencies even beyond the wrapper logic) to the built-in .NET mechanisms. For example, the Locker struct and BransMonitor class.

-- Speaking of the Locker struct, your comment that "this does allocate one additional thing on the stack (it's a struct)" is dangerously false. It's a common misconception that structs are always allocated on the stack; they are not, and in your code, the Locker struct is boxed and passed as an object instance to the "using" statement. It never exists on the stack.

In addition to that basic error in understanding memory usage, there is another potential pitfall here, which is the boxing of the struct in the first place. Your code avoids a problem, because the "using" statement is always working with the originally-boxed value of Locker. But any time you introduce reference-type-like semantics on a value type (i.e. struct), you run the risk of operating on the wrong boxed value at some point. This is of particular concern for dealing with interfaces like IDisposable and anything that involves locking, which of course are _both_ characteristics of your Locker struct.

Honestly, this pattern for synchronization really isn't very useful anyway. There's practically no structural difference in the code between using "using" and using "lock", but the pattern using "using" is obfuscated (i.e. doesn't make it as clear that some synchronization is going on), as well as less efficient (after all, inside you basically wind up doing the same thing as the "lock" statement would anyway). And on top of all that, once you have a type where its intended use is that it locks upon construction and then you're supposed to dispose it when you want to release a lock, you introduce the possibility of a bug where someone forgets to dispose of it.

-- Finally, a couple of minor comments with respect to your Main() method: any time you are doing performance measurement, you should execute a "warm-up" loop (100-500 iterations is fine) before you actually start measuring; and I don't see the point in setting the COM apartment state for your threads, since as near as I can tell you're not using any COM stuff anyway, nor should it really affect performance in any case (the apartment state is more a code-correctness thing, related to whether the code is doing its own synchronization or is going to rely on COM to handle data marshaling for it).

Hope that helps.

Pete
.



Relevant Pages

  • Re: Multi Threading Options
    ... If the avatars struct could be modified by any other thread, ... your modifications are so short that releasing the main lock and locking just ... Given that the refresh is timer=based, you could probably eliminate all synchronization ... If it is a UI thread, then while the drawing is happening, the PostMessage ...
    (microsoft.public.vc.mfc)
  • file operations: release can race with read/write?
    ... I have a question about the synchronization of file_operations: ... static int cw_open(struct inode *inode, ... static int cw_close(struct inode *inode, ...
    (Linux-Kernel)
  • Re: [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v4)
    ... Assuming that we're holding a write lock on mm->mmap_sem here, ... Seems good to minimize additional synchronization on the fast path. ... static struct cgroup_subsys_state * ... goto out_assign; ...
    (Linux-Kernel)
  • Re: [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v4)
    ... Seems good to minimize additional synchronization on the fast path. ... but probably not for systems with more complex accounting. ... static struct cgroup_subsys_state * ... goto out_assign; ...
    (Linux-Kernel)
  • Re: posix-cpu-timers revamp
    ... But now you have two loops through all CPUs, ... Use a single inline function ... thread_group_cputimethat fills a sum struct task_cputime using a single ...
    (Linux-Kernel)