Re: Best way to process data, async IO and completion ports
- From: "m" <m@xxx>
- Date: Mon, 3 Nov 2008 19:48:56 -0500
When evaluating designs like this remember that adding threads, unlike
adding coprocessors, does not offload work since they all, unless you set
affinity, execute on the _same_ CPUs.
The key question in your design is whether or not it is possible / useful to
split the work of processing a single file across multiple threads. If the
answer is yes, then quantize the units of work within each file; otherwise,
a whole file is a unit of work.
Once you have units of work defined, then you can decide which IO model you
want to use for each of the steps. For the DB insertion, there is no choice
since they are all synchronous. For the file read, you can use either
synchronous or asynchronous. In general, choose a synchronous model if your
unit of work is whole file or your app cannot process arbitrary IO
completion and choose asynchronous if your app can be written to be
'statefull'. If you are unsure, sync IO will always work and is usually
easier to code.
Once all of these decisions are made, you can decide how your app will
detect that there is work to be done and how it will accomplish that work.
Either it will be a program run with parameters to process a file or set of
files, or it will be a service (daemon) that runs always and detects files
that need to be processed. Once it 'finds' a file to work on, then it can
either process it directly or assign it to a worker thread to be processed
in parallel. In general, a service will assign work to worker threads and a
batch program will just do the work directly.
A thread pool is an efficient mechanism for dispatching work to worker
threads and the choice of the number of threads should usually be made to
maximize throughput. Achieving maximum throughput is hard because
throughput usually depends on temporal and transient factors that can be
hard to predict. A good rule of thumb is to try to keep all CPUs busy by
having a number of worker threads equal to the number of CPUs divided by the
percent time each task is CPU bound (bias down).
"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:D839488E-3ABB-4BFD-B9EF-42BAB42B1492@xxxxxxxxxxxxxxxx
Thanks for the reply.
Here was my thinking with respect to the processing I put forth in my
original post. In order to make best use of the CPU you would like the
data
it needs to process to be ready and available. In the processing part you
don't want to do a synchronous read and have the thread go into a wait
state.
This means that by the time the processing thread comes around to wanting
to
process another work item, the data should have already been read. This
could be accomplished by having one or more threads reading the data
synchronously and posting to the processor thread, or these reading
threads
could be reading asynchronously and when the data becomes available they
could post to the processing thread (I guess another solution which I've
heard people mention is that the thread which is processing the completed
async IO processes the data itself.). Wouldn't it be better to request
the
data asynchronously so that you don't "waste" a thread from the thread
pool
waiting on IO?
--
Thanks,
Nick
nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com
""Jialiang Ge [MSFT]"" wrote:
Good morning Nick
This is a very good question, and it is also very difficult to answer
because there are many factors that can affect the "best way". For
example:
How frequently will the work be called? How fast is step 1 and 3? The
"best
way" depends on the product context, thus I suggest that you have a
benchmark for each major implementations and decide the best one
according
to the test result.
Below are my opinions based on some assumptions of the business context.
1. If #1 and #3 are really fast, I think you may even consider
synchronous
IO operations in #1 and #3, and create one IO completion port for #2.
2. Having an I/O completion port for each work has the benefit that the
code logic can be clearer. In order to decide how many threads in each
pool, you may refer to Jeffrey Richard's article:
http://msdn.microsoft.com/en-us/library/cc500404.aspx
(see the section "How Many Threads in the Pool?")
The section introduces a way to dynamically determine the thread number
in
the thread pool.
Please let me know whether the above info are helpful to you or not.
Have a very nice day!
Best Regards,
Jialiang Ge (jialge@xxxxxxxxxxxxxxxxxxxx, remove 'online.')
Microsoft Online Community Support
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you.
Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
msdnmg@xxxxxxxxxxxxxx
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.
MSDN Managed Newsgroup support offering is for non-urgent issues where an
initial response from the community or a Microsoft Support Engineer
within
2 business day is acceptable. Please note that each follow up response
may
take approximately 2 business days as the support professional working
with
you may need further investigation to reach the most efficient
resolution.
The offering is not appropriate for situations that require urgent,
real-time or phone-based interactions. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/en-us/subscriptions/aa948874.aspx
==================================================
This posting is provided "AS IS" with no warranties, and confers no
rights.
.
- Follow-Ups:
- References:
- Best way to process data, async IO and completion ports
- From: nickdu
- RE: Best way to process data, async IO and completion ports
- From: "Jialiang Ge [MSFT]"
- RE: Best way to process data, async IO and completion ports
- From: nickdu
- Best way to process data, async IO and completion ports
- Prev by Date: HCBT_CREATEWND hook crashes hooked process.
- Next by Date: Re: USB - Which arguments to use where to retrieve a USB_INTERFACE_DESCRIPTOR record
- Previous by thread: RE: Best way to process data, async IO and completion ports
- Next by thread: Re: Best way to process data, async IO and completion ports
- Index(es):
Relevant Pages
|