Sorting algorithms was: Factorial
From: Bob O`Bob (filterbob_at_yahoogroups.com)
Date: 05/27/04
- Next message: Randy Birch: "Re: How to show the common dialog in the center of the window?"
- Previous message: Ken: "Re: Open txt file in VB"
- In reply to: Ulrich Korndoerfer: "Re: Factorial"
- Next in thread: J French: "Re: Sorting algorithms was: Factorial"
- Reply: J French: "Re: Sorting algorithms was: Factorial"
- Reply: Schmidt: "Re: Sorting algorithms was: Factorial"
- Reply: U-CDK_CHARLES\\Charles: "Re: Sorting algorithms was: Factorial"
- Reply: Larry Serflaten: "Re: Sorting algorithms was: Factorial"
- Reply: Ulrich Korndoerfer: "Re: Sorting algorithms was: Factorial"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 26 May 2004 20:46:24 -0700
Ulrich Korndoerfer wrote:
>
> Hi Olaf,
>
> Schmidt wrote:
>
> > > > In practise Mergesort does typically 50% more compares and
> > > > 100% more Read/Writes than Quicksort.
> >
> > > I haven't seen this analysis, so frankly I can't say one way or the other,
> > > although there is more to a sort algorithm than compares and moves. There
> > > are also "overhead" considerations, especially where heavy recursion is
> > > done.
> > From what I've measured (same Quicksort with and without recursion) the
> > recursion-overhead was max. 3-5%.
>
> I wrote a little test application (available on my website), which
> allows counting of comparisons, swaps and overall assignments, recursive
> calls and maximum stack depth. For example, on two recursive quick sort
> and merge sort implementations for a random array of longs with 100000
> entries the figures are:
>
> Algorithm : Quick sort
> Numbers : RN varying
> Array size : 100000
> Comparisons: 2391935
> Swaps : 393724
> Assignments: 1267506
> Sort calls : 86334
> Stack depth: 41
>
> Algorithm : Merge sort
> Numbers : RN varying
> Array size : 100000
> Comparisons: 1536478
> Swaps : 0
> Assignments: 3337856
> Sort calls : 199999
> Stack depth: 18
>
> Seems to be that these implementations show merge sort using nearly 3
> times more assignments (each swap counts 3 assignments) than quick sort,
> but doing only about 60% comparisons. Merge sort also calls itself more
> than 2 times often than quick sort does. In execution time, merge sort
> is outperformed by a factor of more than 2 (on my box: 72 ms for quick
> sort and 156 ms for merge sort).
Can you folks make some specific suggestions for a specific real-world need?
I repeatedly need to sort tab-separated-value files of 600megabytes or so,
sometimes up to a GB. And I expect this file size to creep up, perhaps
approaching 2GB as soon as a year from now. Though actually the ability to
sort even larger files might lead to an overall improvement in the efficiency
of the whole system it's a part of. Line lengths vary, though they're rarely
under 90 bytes, or over 1200.
There are usually nine columns (eight tabs/line) but I would hope for a
generalized solution where columns are adaptable. I usually need the sort to
ignore the first column (i.e., everything up to the first tab), then most of the
time sort by the others, in order. Actually just columns 2,3,4, then 5 would be
sufficient, since column 5 holds a unique identifier by design.
Occasionally other column orders are needed for different formats of records,
usually smaller file sizes, but commonality of the sort routines would more than
make up for these smaller one-off sort runs being slightly suboptimal.
Any suggestions? Maybe even sample code?
Bob
--
- Next message: Randy Birch: "Re: How to show the common dialog in the center of the window?"
- Previous message: Ken: "Re: Open txt file in VB"
- In reply to: Ulrich Korndoerfer: "Re: Factorial"
- Next in thread: J French: "Re: Sorting algorithms was: Factorial"
- Reply: J French: "Re: Sorting algorithms was: Factorial"
- Reply: Schmidt: "Re: Sorting algorithms was: Factorial"
- Reply: U-CDK_CHARLES\\Charles: "Re: Sorting algorithms was: Factorial"
- Reply: Larry Serflaten: "Re: Sorting algorithms was: Factorial"
- Reply: Ulrich Korndoerfer: "Re: Sorting algorithms was: Factorial"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|