Re: ArrayList BinarySearch vs Contains
- From: "Dave Sexton" <dave@jwa[remove.this]online.com>
- Date: Sun, 12 Nov 2006 04:20:15 -0500
Hi Peter,
<snip>
When a collection is small, there is more of a chance that the data will
coalesce in a way that undermines the effectiveness of an algorithm. Larger
collections may have a better distribution of data, which can increase the
effectiveness of an algorithm much like how larger data sets provide more
accurate results, probabilistically.
Have I understood you correctly?
I don't think so. :) I don't recall writing anything that tried to address
*why* the order of an algorithm is accurate only for large data sets. Only
that it is.
Yes, I see that.
I jumped to conclusions when I read "you can't rely on the order when the data
set is small".
That sounded to me just like how probability works (expressed inversely) - the
larger the sample, the more accurate the prediction. I saw the big O notation
as a prediction of an algorithm's effectiveness on any given set when I wrote
that response.
After much mulling, I see now (again) that the notation describes the number
of steps that an algorithm might require in a given case, but not the chance
of that case actually happening. Cases such as "best", "worst" and "average"
can be expressed by the big O notation. The chances that any of those cases
will occur depends on other factors that simply aren't expressed by the
notation, such as the amount of data, its structure or its distribution within
the set.
As for the whys, I admit once again to not being an expert on the topic.
However, if I recall my studies correctly the reason that the order applies
to infinitely large data sets is not because of a statistical convergence.
It's because when you analyze the cost of an algorithm, there are a variety
of components to that cost, but as the data set gets infinitely large, the
contribution of those costs increase unevenly. One eventually dominates,
and that's the one that is expressed as the order of the algorithm.
Great response, thanks.
If it were an issue of probabilities, then (for example) one would not talk
about the "average" or "worst-case" order of an algorithm. The "worst-case"
wouldn't exist, since you'd be considering only random distribution of the
input data, and an infinitely large data set. But as we've already seen,
for Quicksort even with an infinitely large data set, one can still have an
order that is different from a statistically random and large sample.
Got it.
<snip>
--
Dave Sexton
.
- References:
- ArrayList BinarySearch vs Contains
- From: tshad
- Re: ArrayList BinarySearch vs Contains
- From: Carl Daniel [VC++ MVP]
- Re: ArrayList BinarySearch vs Contains
- From: tshad
- Re: ArrayList BinarySearch vs Contains
- From: Dave Sexton
- Re: ArrayList BinarySearch vs Contains
- From: Peter Duniho
- Re: ArrayList BinarySearch vs Contains
- From: Bill Butler
- Re: ArrayList BinarySearch vs Contains
- From: Dave Sexton
- Re: ArrayList BinarySearch vs Contains
- From: Peter Duniho
- Re: ArrayList BinarySearch vs Contains
- From: Dave Sexton
- Re: ArrayList BinarySearch vs Contains
- From: Peter Duniho
- Re: ArrayList BinarySearch vs Contains
- From: Dave Sexton
- Re: ArrayList BinarySearch vs Contains
- From: Peter Duniho
- ArrayList BinarySearch vs Contains
- Prev by Date: Re: C# MHTML Code
- Next by Date: Re: Visual Studio on Vista
- Previous by thread: Re: ArrayList BinarySearch vs Contains
- Next by thread: Re: ArrayList BinarySearch vs Contains
- Index(es):
Relevant Pages
|