Re: ArrayList BinarySearch vs Contains

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Hi Peter,

<snip>

When a collection is small, there is more of a chance that the data will
coalesce in a way that undermines the effectiveness of an algorithm. Larger
collections may have a better distribution of data, which can increase the
effectiveness of an algorithm much like how larger data sets provide more
accurate results, probabilistically.

Have I understood you correctly?

I don't think so. :) I don't recall writing anything that tried to address
*why* the order of an algorithm is accurate only for large data sets. Only
that it is.

Yes, I see that.

I jumped to conclusions when I read "you can't rely on the order when the data
set is small".

That sounded to me just like how probability works (expressed inversely) - the
larger the sample, the more accurate the prediction. I saw the big O notation
as a prediction of an algorithm's effectiveness on any given set when I wrote
that response.

After much mulling, I see now (again) that the notation describes the number
of steps that an algorithm might require in a given case, but not the chance
of that case actually happening. Cases such as "best", "worst" and "average"
can be expressed by the big O notation. The chances that any of those cases
will occur depends on other factors that simply aren't expressed by the
notation, such as the amount of data, its structure or its distribution within
the set.

As for the whys, I admit once again to not being an expert on the topic.
However, if I recall my studies correctly the reason that the order applies
to infinitely large data sets is not because of a statistical convergence.
It's because when you analyze the cost of an algorithm, there are a variety
of components to that cost, but as the data set gets infinitely large, the
contribution of those costs increase unevenly. One eventually dominates,
and that's the one that is expressed as the order of the algorithm.

Great response, thanks.

If it were an issue of probabilities, then (for example) one would not talk
about the "average" or "worst-case" order of an algorithm. The "worst-case"
wouldn't exist, since you'd be considering only random distribution of the
input data, and an infinitely large data set. But as we've already seen,
for Quicksort even with an infinitely large data set, one can still have an
order that is different from a statistically random and large sample.

Got it.

<snip>

--
Dave Sexton


.



Relevant Pages

  • Re: Big Push for Guest Worker Program
    ... so that I wouldn't every have to do taxes again. ... Since my big distribution comes in November, ... I remember my car license plate ID by an algorithm rather ...
    (soc.retirement)
  • Re: GetHashCode for Objects that Compare Based on Value (Not reference equality)
    ... Further, for those sets of data, an XOR hash produces a sufficiently ... random distribution for hashes to work just fine. ... realm of developing a good generic algorithm; ... Saying that XOR produces collisions is not very useful. ...
    (microsoft.public.dotnet.framework)
  • Re: GetHashCode for Objects that Compare Based on Value (Not reference equality)
    ... one cannot characterize 'all data' or even 'real life ... my algorithm you quickly get a broader range. ... the definition of random distribution, and thus should data in member ... It follows that given a class with random member variable values, ...
    (microsoft.public.dotnet.framework)
  • Re: Is Itos Lemma correct?
    ... your calculations? ... What algorithm did you use for your random number generator? ... Here's the original location and distribution of Mr. Steve Park: ... It's characterized as "a very accurate approximation of the normal idf". ...
    (sci.math)
  • Re: Locating Nearest Neighbors in space (fast)
    ... The brute force algorithm that you described is O, ... so the nearest neighbour distribution is very wide. ... If your distribution of nearest neighbour separations is heterogeneous (like ... stars) then I'd recommend an octree or k-D tree (adaptively subdividing ...
    (comp.programming)