Re: Excel Sort() Algorithm

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Harlan Grove (hrlngrv_at_aol.com)
Date: 07/31/04


Date: Sat, 31 Jul 2004 01:16:21 -0700


"Myrna Larson" <anonymous@discussions.microsoft.com> wrote...
>As far as "major" goes i meant a major change in the logic. IMO having
>to create and compare a 2nd key fits that description. That doesn't
>imply it's hard to program.

While I implemented it using a second array, all I did was augment the sort
keys by appending the original order as the lowest order element. Since the
original order necessarily has all distinct values, the augmented key also
has all distinct values, so original order within original keys would be
preserved. This isn't a major change in logic. This is a minor change in the
sort key which could have been fed through the same quicksort algorithm to
give a stable sort in terms of results.

>re your Q&D routine: It doesn't look much like the QuickSort routines
>I've written . . .
...
>As originally described, QS involve two loops, one searching from the
>top down for a value <= the target, another searching the array from
>the bottom up for a value > than the target, then swapping these 2
>values.

You mean 3 loops. You're forgetting your own outer, 'infinite' loop.

>You have only 1 loop, in which you move a lower value to a
>predetermined lower position (k). There's no loop to search for a
>higher value with which to swap.
>
>You have to study the code and figure out the logic all over again
>(well, I do, anyway)! IMO this makes it a "major" modification. Whose
>variation is this?

Fairly standard in Unix literature.

Bentley's "Programming Pearls" (ISBN 0-201-10331-1) page 112.

Kernighan & Ritchie's "The C Programming Language, 2nd Ed." (ISBN
0-13-110362-8) page 87.

>In my preliminary testing, your routine takes 33% more time to sort a
>randomized listed of 10,000 strings constructed with the formula
>=TEXT(RANDBETWEEN(0,999),"000") & TEXT(ROW()," 00000") and sorting on
>the 1st 3 characters than the "standard" routine I show below.

The multiple loop version involves fewer swaps than the single loop version.
For VBA variants containing strings that's a performance killer, but in
lower level languages like C with block memory moves it's not such a big
deal.

All quicksort requires is a partitioning of the elements such that the pivot
value is in its correct location when the partitioning has been completed.
That's what the 1-loop algorithm I used provides and what the 3-loop
algorithm you used provides. They just do it differently.

In a library quality procedure using block memory operations, the 1-loop
algorithm could add an extra array index and defer swapping until it hits a
record not less than the pivot value, then shift the adjacent less-than
records as a single memory block down one record size chunk and copy the
k_th record into the hole opened above it. While the memory move/copy
operations may be nontrivial in themselves, there's much to be said for the
simplicity of the 1-loop approach.

However, it stinks in VBA. That said, I don't find yours looks a lot like a
translation of an assembler version. I'd prefer a more structured version
(granted with some performance tweaks).

Sub qsortproc1d( _
  arr As Variant, _
  Optional a As Long, _
  Optional z As Long _
)
'--------------------------------------------
'Q&D ascending order quick sort -- RECURSIVE!
'Based mostly on Robert Sedgewick's "Algorithms,
'2nd Ed." (ISBN 0-201-06673-4) page 118.
'--------------------------------------------
  Dim i As Long, j As Long
  Dim kv As Variant, pv As Variant, t As Variant

  If a > 0 And a >= z Then Exit Sub

  '-- error checking --
  If Not IsArray(arr) Then Exit Sub

  '-- optional arg default initializations --
  If a = 0 Then a = LBound(arr)
  If z = 0 Then z = UBound(arr)

  '-- avoid recursion and looping if only 2 items --
  If z - a = 1 Then
    If Left(arr(a), 3) > Left(arr(z), 3) Then
      t = arr(a)
      arr(a) = arr(z)
      arr(z) = t
    End If

    Exit Sub
  End If

  '-- select pivot value at random --
  i = a + Int((z - a + 1) * Rnd)

  '-- nothing to do if i = z --
  If i < z Then
    pv = arr(i)
    arr(i) = arr(z)
    arr(z) = pv
  End If

  '-- cheat: evaluate pivot value's key value once --
  kv = Left(pv, 3)

  i = a - 1
  j = z

  Do While i < j
    Do
      i = i + 1
    Loop While i < z And Left(arr(i), 3) < kv

    Do
      j = j - 1
    Loop While j > a And Left(arr(j), 3) > kv

    t = arr(i)
    arr(i) = arr(j)
    arr(j) = t
  Loop

  arr(j) = arr(i)
  arr(i) = pv
  arr(z) = t

  '-- skip recursive calls for single items --
  If i - a > 1 Then qsortproc1d arr, a, i - 1
  If z - i > 1 Then qsortproc1d arr, i + 1, z
End Sub

As for making this stable, I'd just translate the algorithm on page 9 of

http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf

but I'm too tired to do it tonight.



Relevant Pages

  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... and Richard made it clear that he understands the order ... >> of evaluation of a for loop. ... > using strlen but using an Oalgorithm when there is a trivial O ... >> In most other languages the terminating ...
    (comp.programming)
  • Re: Efficiency questions: combined ifs and looping 4 times
    ... Choice of algorithm always has a far bigger impact in performance than ... Use sizeof before the loop, store the result in a var and use ... speaks well of PHP. ... your order of complexity analysis is off. ...
    (comp.lang.php)
  • Re: count of each word occurred
    ... > take each word in order and compare it with the rest of the list ... Another version of this algorithm takes advantage of the fact that it does ... loop over all words in the list ... a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq ...
    (alt.comp.lang.learn.c-cpp)
  • Re: How to apply Distinct to a recordset?
    ... it's because each grouping of field values is a new filter and I will loop ... then filter again on each group to find the common ... I didn't thinkabout the expense side of the equation ... wise the ForEach iterator has to be a variant as you know. ...
    (microsoft.public.vb.database.ado)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... and Richard made it clear that he understands the order ... > of evaluation of a for loop. ... using strlen but using an Oalgorithm when there is a trivial O ... > condition is an end value and in these languages the strlen value ...
    (comp.programming)