Re: Walking Dataset, faster/better way?! Help!!!

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



I though this was a bug but thinking about your data set, it is rather large
since you perform an n comparison per each 7000 items that works out to
around 50 million comparisons. You probably need to restructure your logic
to cut down on the number of the comparison. For instance, one optimization
you can do is prune the data set by removing dupes. If there are dupes, then
that amounts to a wasted comparison. You can also break the dataset into two
pieces that can run concurrently instead of one whole piece that executes
sequentially. there are a couple other optimizations, it will boil down to
trial and error mostly.

--
Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The Microsoft Office Web Components Black Book with .NET
Now Available @ www.lulu.com/owc
Forth-coming VSTO.NET - Wrox/Wiley 2006
-------------------------------------------------------



"Alex" <Alex@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:68F70CF7-4416-4503-BDB4-B4817A9DA7FA@xxxxxxxxxxxxxxxx
> Thanks for reply Alvin. I was wondering what do you refer to when you
> mention replacing for loop with datatable find? Are you talking about
using
> select on the datatable? I am using that already in my lookups to narrow
the
> list of items down, and I am using multi-threading since the code usually
> runs on 8 cpu box. The problem seems to surface any time we hit some
batches
> of data that are several thousand rows long. Like I have a batch of 7000
> rows right now and it's taking 12 hours and still not done. Granted its
> running on my local pc which is not that bit (2.8ghz, 1 gig ram), but my
> point is that it seems to take forever and I cannot identify where the
> problem is. I've used various logging to identify which section of code
> takes more time, but most process really quickly, it's just in combination
> all together when using the comparison logic I have they end up taking
> forever.
> Thanks for any suggestions!
>
>
> "Alvin Bruney - ASP.NET MVP" wrote:
>
> > The way I implemented this was to first clone a copy of the datset.
> > Then, I built a function that takes the dataset as a paremeter and
returns a
> > new dataset with the results of the comparison. Inside the function, i
> > started with a for loop and a basic comparison. If the item was what i
was
> > looking for, i removed it from the cloned table and added a copy to my
new
> > dataset (this is what i return). I set the primary key on the returned
> > dataset to prevent addition of duplicates. My method function looked
> > something like this
> > BuildComparisonList(dataset ds, dataset ds2, ref temp) then i would call
the
> > function as
> > BuildComparisonList(A,B, list) I would then run this on one thread. I
fired
> > the method on another thread but called BuildComparisonList(B,A). The
idea
> > was to use B and compare against A, but at the same time call A and
compare
> > against B. Inside the function, i protected the vulnerable parts
(deletes
> > and additions) with locking. My timing showed that this was
significantly
> > faster than moving the logic to the database when there were several
> > thousand rows to be compared and a significant number resulted in a
match.
> >
> > Part of the optimization was because on each successful comparison the
> > record was removed resulting in steadily decreasing subset to compare.
> > Another optimization that made this sing was to remove the for loop and
> > replace it with a datatable find. The performance improved by an order
of
> > magnitude. The final implementation ran circles around a db
implemntation
> > running on an informix 9x engine db running on a quad box.
> >
> > I have the code somewhere i can dig up if you aint scared of threads!
The
> > original implemenation actually found the difference between two items
that
> > were equal.
> >
> > --
> > Regards,
> > Alvin Bruney [MVP ASP.NET]
> >
> > [Shameless Author plug]
> > The Microsoft Office Web Components Black Book with .NET
> > Now Available @ www.lulu.com/owc
> > Forth-coming VSTO.NET - Wrox/Wiley 2006
> > -------------------------------------------------------
> >
> >
> >
> > "ChadCThomas" <ChadCThomas@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> > news:CC09276D-AA11-4DD2-8A39-AA3BF5201B60@xxxxxxxxxxxxxxxx
> > > I had a similar issue a whil back. We got some enhancement using a
> > DataTable
> > > and not a DataSet, but we got the most gain by doing the analysis
within
> > the
> > > DB itself.
> > >
> > > T-SQL and PL/SQL are better suited for working with data.
> > >
> > > "Alex" wrote:
> > >
> > > > Hi! I've been looking for the solution for this problem for a while
now.
> > I
> > > > hope that someone out there knows a more efficient way to do this.
> > > > In short, I have a Dataset with n amount of records. I need to go 1
row
> > at
> > > > a time comparing values in some of its columns against similar
columns
> > in all
> > > > other rows in the same Dataset. Then, I move to the next row and do
the
> > same
> > > > again. When I find some qualified records I add those rows to
another
> > local
> > > > Dataset that is being pushed back to SQL once the process is
complete.
> > But
> > > > problem as you can see is that if there are thousands or rows in
> > original
> > > > dataset I end up processing n X n amount of rows and it seems to
really
> > > > hinder the performance and takes to long to process. If someone
knows a
> > > > better way to do such comparison please let me know. The source
data
> > comes
> > > > from Oracle, but is stored in a regular Dataset so I'm opne to any
> > > > suggestions.
> > > >
> > > > thanks in advance...
> > > >
> > > > Alex
> >
> >
> >


.



Relevant Pages

  • Re: Suggestions of routine to fit non-linear data to empirical functions?
    ... I have a large discrete data set with several ... independent variables and I want to be able to prescribe several different ... empirical equations to see their ability to describe my data. ... but very novice in optimization. ...
    (comp.soft-sys.matlab)
  • Re: Ask for book for efficient coding in C
    ... you know that it is possible for the data set to grow, ... If you are writing a *library* then ... Portability, re-entrancy, maintainability and performance optimization ... way I make sure the quality of the code is not compromised is that I ...
    (comp.lang.c)
  • remote sensing
    ... i am doing project in remote sensing in order to recognize the human settlements from vegetation areas how to prepare the data set by optimization of the radiometric signal -including the most relevant bands,the signal standardization and signal synthesis any one pls help me. ...
    (comp.soft-sys.matlab)
  • Re: [PATCH][RFC] fast file mapping for loop
    ... This works reasonably well for simple things, like mapping an ... and the copy ran at 25MB/s for the whole data set. ... Because many of these have been shipped over the last two years and new loop code would only be useful in this case if it were compatible so old data sets could be read. ... are done once they hit page cache. ...
    (Linux-Kernel)
  • Re: looping eating 100 per cpu
    ... If an insert occurs on every iteration through the loop, ... > data set to be used in next loop ... > Shawn Wilson ...
    (comp.lang.php)