Re: How to get synchronized efficiently
- From: "m" <m@xxx>
- Date: Sun, 25 May 2008 17:43:31 -0400
A semaphore, behaving as a mutex, is one way to protect access to a queue.
(it should be obvious why it must behave as a mutex) This method will
interact with the OS thread scheduler and attempt to minimize the waste, in
CPU time not necessarily wall clock time, of waiting for exclusive access to
the queue.
For situations where the work to be done is very small relative to the cost
of interacting with the OS scheduler (UM to KM transition etc.), interlocked
or spin wait locks can provide significant improvements to wall clock time
but can also easily waste CPU time.
The choice of the best sync model ALWAYS depends on the design of your
program. The best possible sync method is to design the execution flow to
minimize the need for synchronization and then to use a situationally
appropriate sync object or method at each remaining site. The use of work
queues, possibly based on IOCP, and state based algorithms can be a good way
to achieve the first part - elimination of need for sync. The
CRITICAL_SECTION object is a standard solution to the second.
In regards to your perf tests, it is hard to say exactly what is going on
without seeing code but here are some things to consider:
Setting the concurrency to 1 will almost certainly improve time to execute
this trivial test because it means that there is no contention and a single
thread will drain / fill the whole queue.
Also, consider the effect of the differences in the way that XP and 2K3
schedule threads when evaluating your comparison of different hardware. The
server OS is more optimized for doing more work with fewer context switches
but your algorithm, a tight loop around GQCS & a sync, will have many
context switches and UM - KM transitions.
"Kürþat" <xx@xxxxxx> wrote in message
news:eJbOwAcvIHA.4876@xxxxxxxxxxxxxxxxxxxxxxx
You are right about IOCP that IOCP aleready has a queue structure and no
need to interact with another. It may be needed in some rare cases, for
example, if it is required to enqueue some generated data to another
shered queue and there is no difference between writing and reading in
this case which I tested.
Let's set aside IOCP and consider the semaphore case. I think the
semaphore is appropriate object to implement such a multithreaded data
access logic. What I don't comprehend is the performance decrease when I
run the same simple test on much more powerful hardware.
"m" <m@xxx> wrote in message news:OM1YlMTvIHA.516@xxxxxxxxxxxxxxxxxxxxxxx
I suggest that you think about what you are trying to measure with this
test. As a sync object for a stack, an IOCP is a terrible choice. First
of all, you'll need another sync object, in you case a CS, to actually
sync access to the stack, and secondly, there is no benefit to using
multiple threads to manipulate a single data structure in this way.
IOCP becomes useful when you have many data structures etc. that need to
be manipulated independently but don't want to have a thread for each.
IOCP can also be used as a work queue; but in that case, you should
measure the time required to call PostQueuedCompletionStatus because
there is no need to have another data structure to interact with.
"Kürþat" <xx@xxxxxx> wrote in message
news:efa%23VFLvIHA.4912@xxxxxxxxxxxxxxxxxxxxxxx
Hi,
Multithreading or,in more general terms, providing concurrency is very
hard to imlement for which I have struggled for a while. There are tons
of issues to care about.
Recently I made some tests to compare performance of some
synchronization objects and now I have interesting results. I want to
share my short test adventure with you and want to hear your comments
and advices :
Most of the large scale server applications have one or more data queues
processed by multiple threads. I made a test to see what is the most
approprate method to synchronize to access a queue form multiple thread.
I used a stack object as container and pushed 90.000 item into it before
every test. My test and results are below :
SYSTEM : Intel Centrino Duo, 2GB RAM, Windows XP Professional (SP2)
- Using IOCP :
* Concurrency value is 0 so that the OS to decide the most
appropriate concurrent thread count.
* 10 threads in the thread pool (Each calls GetQueuedCompletionStatus
for the same IOCP)
* 900.000 completion posted (PostQueuedCompletionStatus) to the IOCP.
* A critical section is initialized with spin count of 4000.
* The thread proc simply enters the critical section, pushes an item
from the stack and leaves the critical section.
Results :
Average Duration : 969 ms.
Average CPU Usage : %50
- Using Semaphore :
* Everything is the same with IOCP test except using
WaitForSingleObject instead of GetQueuedCompletionStatus and
ReleaseSemaphore instead of PostQueuedCompletionStatus.
Results :
Average Duration : 600 ms.
Average CPU Usage : %60
- Using InterlockedDecrement:
* I set a shared variable to 900.000 and decrement it for every
processed item by using InterlockedDecrement :
Results :
Average Duration : 450 ms.
Average CPU Usage : %70
Same tests were made using InterlockedSList and average durations are :
IOCP : 969 ms.
Semaphore : 562 ms.
Interlocked : 234 ms.
...And now is the time : I executed the same test on Intel Xeon Quad
Core 2GHz CPU, 4GB RAM, Windows Server 2003 Enterprise(SP2) :
With Stack :
IOCP : 1172 ms.
Semaphore : 656 ms.
Interlocked : 984 ms.
With InterlockedSList :
IOCP : 1078 ms.
Semaphore : 688 ms.
Interlocked : 328 ms.
Now I have some questions :
- Everyone says : "IOCP has most efficient threading model, it scales
well, etc..." as you can see the semaphore has great performance
comparing IOCP. What do you think?
- What do you think about negative impact of the Xeon system? I would
expect superior performance but completely disappointed.
Thanks in advance.
.
- References:
- How to get synchronized efficiently
- From: Kürşat
- Re: How to get synchronized efficiently
- From: m
- Re: How to get synchronized efficiently
- From: Kürşat
- How to get synchronized efficiently
- Prev by Date: Re: Short question about Interlocked family of functions
- Next by Date: Re: VirtualAlloc fails (ERROR_NOT_ENOUGH_MEMORY) on 22MB request; I compute 420MB available...
- Previous by thread: Re: How to get synchronized efficiently
- Next by thread: Re: How to get synchronized efficiently
- Index(es):
Relevant Pages
|