Re: Multithreaded dataprocessing too slow... Help!

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



I assume you are on v1.1?

Hmmmm... Rows.Add (for 1.1 or 2.0) is not a thread-safe operation, so you
are exposed to all sorts of corruption.

Don't have the thread that performed the work copy the data into the shared
dataset, otherwise you'll still have to lock it. Pass the unit of work off
to a specific thread that fills the global dataset. Make sure this is the
_only_ thread that accesses the shared dataset.

Cheers,

Stu

"Alex" <Alex@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:36E92AFE-E96B-49CD-88E5-99B9BEB65666@xxxxxxxxxxxxxxxx
> Stu, thatnks for quick reply! As I understood you want me to keep the
> local
> (non-shared copy of dataset) and at the point of the thread finishing
> processing its data portion, copy thread's local copy of data to one
> global
> dataset kept in thread pool manager. That sounds like a good idea. I'll
> definetely will try this to see if performance is increased. I wonder why
> I
> did not have any problems with locking, unlocking of each thread for the
> shared dataset. I did not use synclock on the dataset and it still seem
> to
> work.
>
> thanks again!
>
> "Stuart Carnie" wrote:
>
>> Alex,
>>
>> Are you saying this shared copy is going to be access by each of the
>> threads
>> to add recombine the data back into 1 dataset?
>>
>> If that is the case, my thoughts would be:
>>
>> Main Thread responsibilities:
>>
>> 1. Creates Thread pool.
>> 2. Breaks up data into batches and feeds the individual batches to the
>> thread pool of "worker threads"
>>
>> 3. Main thread starts an "aggregator thread" that receives events from
>> the
>> worker threads when they have finished processing a batch. It would be
>> the
>> only thread to access the dataset to recombine the data. Reason I
>> suggest
>> this, is you will have significant locking contention if you use a shared
>> copy of the dataset that each worker thread could directly add to it. By
>> that, I mean you would have to lock the shared dataset for each row that
>> is
>> added - expensive.
>>
>> 4. Each worker thread should raise an event when it is done, so the
>> aggregator thread can receive their unit of work and copy the data into
>> the
>> aggregated dataset.
>>
>> Just some things to think about.
>>
>> Cheers,
>>
>> Stu
>>
>> "Alex" <Alex@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>> news:18DF0301-8BC3-4961-8B74-691B29DD9587@xxxxxxxxxxxxxxxx
>> > Hi! I need help with identifying a new way to process report data that
>> > currently is too slow to meet
>> > our production goals.
>> >
>> > In short, we have data in Oracle database where we need to process it
>> > in
>> > batches/waves that are identified
>> > by a sertain group of columns. Once we identify that batch, there's
>> > business logic that is applied to it
>> > where each row in the batch is compared against every other row in the
>> > batch
>> > and if conditions met this
>> > row is pushed to MS SQL backend.
>> >
>> > Initially this code was designed as single threaded application, where
>> > a
>> > recursive function was identifying
>> > each batch and did the business processing and adding. As the volume
>> > grew
>> > and we had some large batches
>> > (3000-5000 rows) the report became almost imposible to generate.
>> >
>> > It was decided to convert the app to multithreaded environment. The
>> > approach
>> > was to do the following:
>> >
>> > 1) A separate class created to process the batches. It takes a
>> > predefined
>> > amount of data as the source
>> > to work with and uses the logic from the recursive function in older
>> > version
>> > to process the data and
>> > populate a Shared copy of Dataset with the processed rows.
>> > 2) A thread pool management class created that calculates ranges of
>> > records
>> > to be assigned to a copy of the above mentioned class for processing.
>> > We
>> > currently have set the limit to 20 threads. Once all 20 have been
>> > assigned
>> > the data for processing a loop checks the status of each thread and
>> > when
>> > such
>> > status is Finished it assignes a new chunk of data to it for
>> > processing.
>> > 1) Set a cetain limit of data that would go into each thread (a chunk)
>> > 2) We look at the last record of that chunk (say 1000 records per
>> > chunk/thread) and do a lookup where
>> > record's 1000 batch (do a select on the base table) ends. So, if the
>> > batch
>> > ends somewhere at row number
>> > 1012 then this is the chunk that's being assigned to the thread to
>> > process.
>> > Then the next chunk starts
>> > at 1013 and +1000 and the process of lookup is repeated.
>> > 3) Once all data is processed and all working threads have status of
>> > finished, we push the data accumulated in shared Datasource of those
>> > classes
>> > to SQL server for processing by MS Reporting Service.
>> >
>> > I would really appreciate if someone can suggest any new or better
>> > methods
>> > to handle this situation.
>> >
>> > Thanks in advance!
>> >
>> > Alex
>>
>>
>>


.



Relevant Pages

  • Re: Multithreaded dataprocessing too slow... Help!
    ... Breaks up data into batches and feeds the individual batches to the ... > worker threads when they have finished processing a batch. ... >> status is Finished it assignes a new chunk of data to it for processing. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: Multithreaded dataprocessing too slow... Help!
    ... Breaks up data into batches and feeds the individual batches to the ... worker threads when they have finished processing a batch. ... > status is Finished it assignes a new chunk of data to it for processing. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: Rowset solution is sought. Please help
    ... batches of well over 15,000 parts!! ... -- this will probably hurt performance ... This could cause orders already assigned to a batch ... I think that execution time will always increase exponentially with the ...
    (microsoft.public.sqlserver.programming)
  • Re: When is autocovariance small indicating independent values?
    ... So I divide my set of values in batches of same duration. ... the challenge is to find a batch size that satisfies these ... autocovariance is increasing, but not monotonically. ...
    (sci.stat.math)
  • Re: Fastcode - Alternative SZ IntToStrB&V v0.09
    ... >> - 5 dummy measurments ahead before actual starts ... > Each button click is one batch. ... VCL calls and memory allocations are ... > but they are done between batches. ...
    (borland.public.delphi.language.basm)