Re: Sorting algorithms was: Factorial

From: U-CDK_CHARLES\\Charles (_at_cdksystems.com)
Date: 05/27/04


Date: Thu, 27 May 2004 07:25:31 -0700

On Wed, 26 May 2004 20:46:24 -0700, Bob O`Bob
<filterbob@yahoogroups.com> wrote:
> Ulrich Korndoerfer wrote:
>>
>
> Can you folks make some specific suggestions for a specific real-world need?
>
> I repeatedly need to sort tab-separated-value files of 600megabytes or so,
> sometimes up to a GB. And I expect this file size to creep up, perhaps
> approaching 2GB as soon as a year from now. Though actually the ability to
> sort even larger files might lead to an overall improvement in the efficiency
> of the whole system it's a part of. Line lengths vary, though they're rarely
> under 90 bytes, or over 1200.
>

I think you don't want to "sort" at all.

What you need is a filing system that is self sorting, with decent
performance.

If this is an academic project, DAGS on B+-trees. It's an n-way tree,
with n chosen to fit in the amount of data your OS reads from the HD in
a single read.

Way back in the dark ages of 720kB floppy disks, we chose between a
B-tree and hashing, but chose hashing as it saved a disc access, back
when dics access took seconds rather than ms. Most of the time, we
didn't care about the ordering.

If this is a commercial project, or an acedemic project where you can
spend money, there are a number of commercial libraries available that
do this for you.


Loading