Re: Multithreaded dataprocessing too slow... Help!
- From: "Stuart Carnie" <stuart.carnie@xxxxxxxxxxxxx>
- Date: Wed, 9 Nov 2005 16:52:09 -0700
Alex,
Are you saying this shared copy is going to be access by each of the threads
to add recombine the data back into 1 dataset?
If that is the case, my thoughts would be:
Main Thread responsibilities:
1. Creates Thread pool.
2. Breaks up data into batches and feeds the individual batches to the
thread pool of "worker threads"
3. Main thread starts an "aggregator thread" that receives events from the
worker threads when they have finished processing a batch. It would be the
only thread to access the dataset to recombine the data. Reason I suggest
this, is you will have significant locking contention if you use a shared
copy of the dataset that each worker thread could directly add to it. By
that, I mean you would have to lock the shared dataset for each row that is
added - expensive.
4. Each worker thread should raise an event when it is done, so the
aggregator thread can receive their unit of work and copy the data into the
aggregated dataset.
Just some things to think about.
Cheers,
Stu
"Alex" <Alex@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:18DF0301-8BC3-4961-8B74-691B29DD9587@xxxxxxxxxxxxxxxx
> Hi! I need help with identifying a new way to process report data that
> currently is too slow to meet
> our production goals.
>
> In short, we have data in Oracle database where we need to process it in
> batches/waves that are identified
> by a sertain group of columns. Once we identify that batch, there's
> business logic that is applied to it
> where each row in the batch is compared against every other row in the
> batch
> and if conditions met this
> row is pushed to MS SQL backend.
>
> Initially this code was designed as single threaded application, where a
> recursive function was identifying
> each batch and did the business processing and adding. As the volume grew
> and we had some large batches
> (3000-5000 rows) the report became almost imposible to generate.
>
> It was decided to convert the app to multithreaded environment. The
> approach
> was to do the following:
>
> 1) A separate class created to process the batches. It takes a predefined
> amount of data as the source
> to work with and uses the logic from the recursive function in older
> version
> to process the data and
> populate a Shared copy of Dataset with the processed rows.
> 2) A thread pool management class created that calculates ranges of
> records
> to be assigned to a copy of the above mentioned class for processing. We
> currently have set the limit to 20 threads. Once all 20 have been
> assigned
> the data for processing a loop checks the status of each thread and when
> such
> status is Finished it assignes a new chunk of data to it for processing.
> 1) Set a cetain limit of data that would go into each thread (a chunk)
> 2) We look at the last record of that chunk (say 1000 records per
> chunk/thread) and do a lookup where
> record's 1000 batch (do a select on the base table) ends. So, if the batch
> ends somewhere at row number
> 1012 then this is the chunk that's being assigned to the thread to
> process.
> Then the next chunk starts
> at 1013 and +1000 and the process of lookup is repeated.
> 3) Once all data is processed and all working threads have status of
> finished, we push the data accumulated in shared Datasource of those
> classes
> to SQL server for processing by MS Reporting Service.
>
> I would really appreciate if someone can suggest any new or better methods
> to handle this situation.
>
> Thanks in advance!
>
> Alex
.
- Follow-Ups:
- Prev by Date: Re: Microsoft Document Imaging issue
- Next by Date: Re: Multithreaded dataprocessing too slow... Help!
- Previous by thread: Re: Microsoft Document Imaging issue
- Next by thread: Re: Multithreaded dataprocessing too slow... Help!
- Index(es):
Relevant Pages
|