Re: EnterCriticalSection
- From: "Chris Thomasson" <cristom@xxxxxxxxxxx>
- Date: Wed, 10 Oct 2007 17:34:19 -0700
"Ben Voigt [C++ MVP]" <rbv@xxxxxxxxxxxxx> wrote in message news:Oqdkja4CIHA.5856@xxxxxxxxxxxxxxxxxxxxxxx
"Chris Thomasson" <cristom@xxxxxxxxxxx> wrote in message news:3r6dnU8Tkdb7yJnanZ2dnUVZ_ournZ2d@xxxxxxxxxxxxxx"Chris Thomasson" <cristom@xxxxxxxxxxx> wrote in message news:uridnUt1n7VPyZnanZ2dnUVZ_gGdnZ2d@xxxxxxxxxxxxxx"Ben Voigt [C++ MVP]" <rbv@xxxxxxxxxxxxx> wrote in message news:u6bwt8gBIHA.1184@xxxxxxxxxxxxxxxxxxxxxxx
"Chris Thomasson" <cristom@xxxxxxxxxxx> wrote in message news:cqednWxWK_SafnTbnZ2dnUVZ_qKgnZ2d@xxxxxxxxxxxxxx"Elcaro Nosille" <Elcaro.Nosille@xxxxxxxxxxxxxx> wrote in message news:46dc59d2$0$7690$9b4e6d93@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxEnterCriticalSection section is extremely fast when there's no
contention with another thread holding this critical section.
[...]
The is a false assertion. Describing the operation as "extremely fast" is really misleading to say the least. You probably don't know that acquiring and releasing an uncontended lock involves the use of 2 interlocked RMW instructions and 2 memory barriers, one of which is very expensive. Example using SPARC membar instruction:
Well, minimizing synchronization is worthwhile, because you can't get faster than nothing at all. But a critical section is (one of) the fastest synchronization primitives.
Not really true.
[...]
It not true when compared to wait-free queue implementations that do not rely on interlocking or #StoreLoad/#LoadStore membar instructions (e.g., LOCK prefix, or mfence instructions on x86)...
How do you handle cache incoherency without using LOCK or some other interlocking mechanism?
By using the memory barriers that are already implied by the x86... I am talking about an unbounded single producer/consumer wait-free queue that relies on the assertions made in the following paper:
http://developer.intel.com/products/processor/manuals/318147.pdf
The queue has no #StoreLoad, or even #LoadStore dependencies. The MOV instruction actually provides more than enough memory ordering, according to the paper linked to above which basically states that loads/stores on current x86 are as follows:
void* atomic_x86_loadptr(
void* volatile* psrc
) {
/* atomic load */
1: void* const ptr = *psrc;
/* acquire-load membar */
2: membar #LoadStore | #LoadLoad;
3: return ptr;
}
void* atomic_x86_storeptr(
void* volatile* pdest,
void* const src
) {
/* release-store membar */
1: membar #LoadStore | #StoreStore;
/* atomic store*/
2: *pdest = src;
3: return src;
}
I can show you some pseudo-code if you want... Also, read all of this:
http://en.wikipedia.org/wiki/RCUhttp://en.wikipedia.org/wiki/RCU
You could fill the queue on one core and the other would never see a change.
The single consumer will always eventually get a coherent view of every new node linked into the queue by the single producer. The consumer uses "data-dependant" memory barrier after the load to shared pointer that represents the tail of the queue. Here is an example of dependant barriers:
https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068
(read all)
Notice that reader threads can walk the linked-list while a writer thread concurrently mutates it.
.
- Follow-Ups:
- Re: EnterCriticalSection
- From: Chris Thomasson
- Re: EnterCriticalSection
- References:
- Re: EnterCriticalSection
- From: Ben Voigt [C++ MVP]
- Re: EnterCriticalSection
- From: Chris Thomasson
- Re: EnterCriticalSection
- From: Chris Thomasson
- Re: EnterCriticalSection
- From: Ben Voigt [C++ MVP]
- Re: EnterCriticalSection
- Prev by Date: MsgWaitForMultipleObjects Acting Weird
- Next by Date: Re: How to Read Message tables from PE Files,...
- Previous by thread: Re: EnterCriticalSection
- Next by thread: Re: EnterCriticalSection
- Index(es):
Relevant Pages
|