Re: Running Count/Sum in a form

From: Michel Walsh (vanderghast_at_VirusAreFunnierThanSpam)
Date: 11/12/04


Date: Fri, 12 Nov 2004 09:30:22 -0500

Hi,

    The Group By cannot handle duplicated tuples, since it returns just
"one" typical row (illustrated by the use of LAST) per group. The correlated
sub-query will pump all the tuples. In the case of not-ambiguous ordering,
not duplication should occur, or else unique position (no duplicated
position) for ex-equo won't be not-ambiguous assignation.

    There are cases where a correlated sub-query are faster than a join, but
typically (whatever can be consider "typical" is illustrated here after) the
join is faster ( Henderson, "The Guru's Guide to Transact-SQL", page 143,
last paragraph). Cases where the correlated sub-query will be faster is when
it involves a special construct, like EXISTS, which can prune the search
(while the join may have to exhaust all the matching values, through the
index). We can expect the join to be faster than the correlated sub-query
since the join can immediate use indexes and walk down simultaneously, side
by side, the two indexes implied in the ON clause. The correlated sub-query
may miss that possibility in how to use efficiently the two indexes, and
thus, turn out to be slower. Sure, that assumes the fields are indexed, and
that table are larges (else, indexes won't be loaded).

    You can see the side-by-side walk of the two index A LITTLE BIT as if
you manually want to spot any difference between two ordered lists. You
compare the first entry of each list, then the second entry, of each list,
and so on, moving your "index" from the left hand over the first list down
and your index from the right hand, on the second list, until you miss a
difference. Once you spot a disruption, and solve it, you don't start over,
at the start of the list, for one of the index. No, you continue to where
you were. That is a side-by-side walk of the JOIN. The correlated sub-query
WILL (is likely to) start over. The time difference is not as enormous as it
may first appear, since falling back at the right position is relatively
fast: If the main index is at the letter M, the index to be synch won't move
over each record of the second list, one record at a time one, from the
start, but some resynchronization is still required none the less and makes
the correlated sub runs slower, in general.

    An BIG advantage of the join approach (that uses a GROUP BY), is that it
is entirely GRAPHICALLY reprehensible... well, when the join is an equi
join... (in MS SQL Server, non equi join are also graphically
reprehensible, but JET only graphically supports equi join).

Hoping it may help,
Vanderghast, Access MVP

"Jamie Collins" <jamiecollins@xsmail.com> wrote in message
news:2ed66b75.0411120449.5d000654@posting.google.com...
> "Michel Walsh" <vanderghast@VirusAreFunnierThanSpam> wrote ...
>
>> You need an ordering field, or group of fields, without duplication,
>> that determines uniquely the order.
>>
>>
>> SELECT LAST(a.f1), LAST(a.f2), LAST(a.f3), SUM(b.sumWhat)
>> FROM myTable As a INNER JOIN myTable As b
>> ON a.OrderingField >= b.OrderingField
>> GROUP BY a.OrderingField
>> ORDER BY a.OrderingField
>>
>>
>> That query is not updateable. Another query, updateable, but a little bit
>> slower, could be:
>>
>> SELECT a.*, (SELECT SUM(b.sumWhat)
>> FROM myTable As b
>> WHERE a.OrderingField >= b.OrderingField)
>> FROM myTable As a
>> ORDER BY a.OrderingField
>>
>
> Out of interest, have you tested this? The subquery approach is
> familiar to me, the other 'group by' approach not, so I did my own
> testing.
>
> I used a modest ten thousand row table with two columns. MyKeyCol is a
> sequential integer i.e. unique rows exist for values 1 to 10000.
> MyDataCol is a random integer between 1 and 10000 and there are
> duplicate values between rows. I decided to order on the MyDataCol
> and, because we are talking about a row ID, I changed the set function
> involved from SUM to COUNT (duplicates mean a not-so-neat row ID but
> makes for a better test <g>).
>
> Subquery version:
>
> SELECT T1.MyKeyCol, T1.MyDataCol, (
> SELECT COUNT(*) FROM 10K_row_table
> WHERE MyDataCol <= T1.MyDataCol
> ) AS row_ID
> FROM 10K_row_table T1
> ORDER BY T1.MyDataCol;
>
> Group by version:
>
> SELECT LAST(a.MyKeyCol), LAST(a.MyDataCol), COUNT(b.MyDataCol)
> FROM 10K_row_table As a
> INNER JOIN 10K_row_table As b
> ON a.MyDataCol >= b.MyDataCol
> GROUP BY a.MyDataCol
> ORDER BY a.MyDataCol;
>
> Because I'm only familiar with the subquery approach, I felt unable to
> tweak your group by version. [BTW shouldn't your subquery version as
> posted have the join expression less than or equal to?]
>
> The execution times (best of three) were as follows:
>
> Subquery: 1min 44sec
> Group by: 5min 36sec
>
> Additionally, the group by version only returned 9680 rows (a symptom
> of duplicates) and was not in row ID order. When I ordered on the
> unique/sequential column, I got similar times (the group by version
> ran 20secs faster) with correct results for both.
>
> Will you please you confirm this is a fair test and agree the subquery
> is superior approach due to it being updateable, faster and can handle
> duplicates.
>
> Many thanks,
> Jamie.
>
> --



Relevant Pages

  • Re: Boochs book feels too philosophical rather than practical?
    ... Robert Martin wrote: ... shapes and lists of operations on those shapes. ... duplication of one list you introduce duplication of another. ... There are indeed two lists. ...
    (comp.object)
  • Re: A Logical Model for Lists as Relations
    ... > with a rank per any explicit order, I am not even sure the ordinal ... If one has a numeric index that differs for each tuple, one never has duplication. ... In the end, though, I'm confident that the lists MS desires can be ...
    (comp.databases.theory)
  • Re: A Logical Model for Lists as Relations
    ... > with a rank per any explicit order, I am not even sure the ordinal ... If one has a numeric index that differs for each tuple, one never has duplication. ... lists provide nothing more than relations do ...
    (comp.databases.theory)
  • Re: Boochs book feels too philosophical rather than practical?
    ... shapes and lists of operations on those shapes. ... duplication of one list you introduce duplication of another. ... Procedural programming allows you to add new functions without changing the existing data structures. ...
    (comp.object)

Loading