Re: find the largest 1000 values
- From: George <George@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 29 Nov 2007 00:25:01 -0800
Thanks Tom,
Good point.
regards,
George
"Tom Widmer [VC++ MVP]" wrote:
George wrote:.
Hi Tom,
In your description below, there is one point which is missing which matters
to performance.
When you met with an element which is larger than the smallest element in
the 1000 element set, you need to replace the smallest one with the new
element, right? But then, in order to proceed with the loop and logics, you
need to find the new smallest element (previous 2nd smallest element), to
find this, you need to iterate the 1000 set again which takes another O(n)
time, right?
Firstly, I suggested keeping the largest 1000 elements in a set or heap,
which both have logarithmic push/pop time. Secondly, any deterministic
algorithm you apply n times on a fixed number of elements (such as 1000)
is by definition O(n), since runtime only increases linearly in n. If
you change the algorithm to talk about the largest m elements, rather
than the largest 1000 elements, then we could say it is:
O(n log m) worst case (produced when the n datapoints are in sorted
ascending order)
o(n) best case (it will be very close to this if n is much bigger than m
and if the 1000 highest numbers are fairly uniformly distributed
throughout this large sample, which sounds like your situation)
Tom
Please feel free to comment or correct me.
regards,
George
"Tom Widmer [VC++ MVP]" wrote:
George wrote:
Hello everyone,A linear time algorithm is trivial. Just iterate over the data,
I have a large data set of integers (about several Million), all of them are
positive integers from experiment. They are stored on a file on local
machine. I want to select the largest 1000 values then make analysis based on
the 1000 values.
Could anyone suggest some efficient solutions? I think sorting the several M
data in memory is not feasible.
comparing to the smallest of your existing list of the 1000 biggest
items, discarding the new element or discarding the smallest old one in
each case.
Keep track of the smallest of your 1000 items either with a std::heap or
std::set - since it has fixed size, all operations on it are constant time.
Tom
- References:
- Re: find the largest 1000 values
- From: Tom Widmer [VC++ MVP]
- Re: find the largest 1000 values
- From: George
- Re: find the largest 1000 values
- From: Tom Widmer [VC++ MVP]
- Re: find the largest 1000 values
- Prev by Date: Re: which way is more efficient to check bit value
- Next by Date: Re: find the largest 1000 values
- Previous by thread: Re: find the largest 1000 values
- Next by thread: Help. In ftp sessions, how to refresh directory list?
- Index(es):
Relevant Pages
|