Re: ThreadPool Questions
- From: Peter Duniho <NpOeStPeAdM@xxxxxxxxxxxxxxxx>
- Date: Fri, 03 Aug 2007 21:16:55 -0700
Sorry...I posted my much more general reply before seeing this post.
Anyway...
reeset wrote:
I can definitely tell you that IO is insignificant.
So, having a thread for each CPU should help. As I mentioned, more threads than that won't.
This is very important. If you are running this code on a computer with only one CPU, and your code is truly CPU-bound, then more threads are only going to make things worse.
For the rest of this post, let's assume you've got extra CPUs you can use.
The record processing itself makes up the lionshare of the "work" being done by the application. It was part of the reason why I'd thought placing the items in a threadpool would provide some optimizations.
It could. What design you'll have to use depends on what the actual cost per-record for processing is. If each record takes seconds to process, then I think queuing a work item for each record is a very easy, workable solution.
You will have ordering issues, which are easily addressed. You will also want to take out the part of your code that stops every five records and waits for the other threads to catch up. Doing that negates much of the benefit of multi-threading. Finally, you will want to limit the thread pool size to have no more active threads than you have CPUs.
If each record processed takes something like 500 to 1000 milliseconds, then you're in a bit of a gray area. You _should_ see a benefit even if you queue one worker thread item per record, but your overhead will start to reduce your efficiency in a noticeable way.
If each record processed only takes tens of milliseconds or less, then thread scheduling and context switching could easily offset whatever gain you have from parallelizing the work, if you are only assigning a single record to each worker queue item.
Now, with all that out of the way...the design Samuel suggests seems like it should work reasonably well for you. Specifically, having a thread for i/o, feeding a queue that multiple worker threads can consume, should work nicely and prevents any need to create new worker items. You just start the worker threads and let them continue as long as there is data to process. This avoids having to worry about the exact relative cost of processing each record.
The one modification to his suggestion that I'd make is this: ordering the data is a lot easier than it seems that he's suggesting. The data can be kept in some kind of ordered list from the outset, left there in order the whole time. Here's a pseudo-code description of what I mean, using Samuel's design as the framework:
Reader thread:
while not end of data:
read record
add record to ordered list
pulse a monitor that the workers wait on
Worker thread:
while not end of data:
scan forward through the ordered list to find first unprocessed record
if the end of the list is reached, wait on the monitor and then scan again
process the record, mark it as having been processed
pulse a monitor that the consumer waits on
Writing thread:
while not end of data:
scan forward through the ordered list, handling records until reaching an unprocessed record or the end of the list
wait on the monitor and then scan again
In this way, the records always remain in the correct order. There is a slight inefficiency in that the consumer thread will occasionally have to wait at an unprocessed record even though another record later in the sequence has already been done, but assuming that the processing itself is really the bottleneck, this shouldn't really be a problem. IMHO, the simpler design more than makes up for any minor delays there (and of course the delay is only really noticed if it occurs at the very end of the processing sequence, since otherwise the consumer thread should easily catch up at the next opportunity).
You could use a LinkedList or List<> collection to keep the data ordered according to the input order of the records. Just add new data to the end of the collection. The LinkedList would be more efficient if you want to remove items from the front of the list as they are finally handled by the consumer.
You might be tempted to use a couple of Queues, but the need to keep the records in the same order as they are initially read would make that impractical, as you'd have to create some sort of intermediate data structure to ensure that processed records are added to the consumer's Queue in the same order in which they were removed from the reading thread's queue.
Other than that change, I think Samuel's suggestion may in fact be ideal for your situation.
Pete
.
- Follow-Ups:
- Re: ThreadPool Questions
- From: Peter Duniho
- Re: ThreadPool Questions
- References:
- ThreadPool Questions
- From: reeset
- Re: ThreadPool Questions
- From: Samuel R . Neff
- Re: ThreadPool Questions
- From: reeset
- ThreadPool Questions
- Prev by Date: Re: ThreadPool Questions
- Next by Date: Re: How do I set the focus on a particular tab in a tab control?
- Previous by thread: Re: ThreadPool Questions
- Next by thread: Re: ThreadPool Questions
- Index(es):
Relevant Pages
|