Re: Counting with distinct - Optimizer Problems
From: Steve Kass (skass_at_drew.edu)
Date: 03/17/04
- Next message: Anand: "Re: IF UPDATE (column) in a trigger"
- Previous message: Greg Obleshchuk: "Re: Case statements"
- In reply to: Uri Dimant: "Re: Counting with distinct - Optimizer Problems"
- Next in thread: Uri Dimant: "Re: Counting with distinct - Optimizer Problems"
- Reply: Uri Dimant: "Re: Counting with distinct - Optimizer Problems"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 17 Mar 2004 00:53:16 -0500
Nice trick, Uri. ;)
I'll add that Shawn isn't missing anything, in my opinion. Queries with
both count(*) and count(distinct ...) are often real trouble for the
optimizer, and some user intervention is probably necessary. The view
trick will help with the count(*) and can be joined to the rest of the
query, but it won't speed up the count(distinct..). I don't know if
NOEXPAND and a join will force a good plan - my guess is it will help,
but whether this solution is better than user intervention without a
view depends on how often this query needs to be run and on how
frequently there are inserts and deletes from the table. In any case,
if Shawn uses this trick, the column to replace T.i should be the
primary key column or any column that is never or very infrequently
updated, so that only deletes and inserts will contribute to the overhead.
SK
Uri Dimant wrote:
>Shawn
>I'd recommend you create an indexed view .
>create table T (
> i int
>)
>go
>
>create view T_count with schemabinding
>as
> select cast(i as bit) as val, count_big(*) T_count from dbo.T
> group by cast(i as bit)
>go
>
>create unique clustered index T_count_uci on T_count(val)
>go
>insert into T
>select OrderID
>from Northwind..[Order Details]
>go
>
>select sum(T_count) from T_count with (noexpand)
>-- uses an efficient query plan on the materialized view
>
>go
>drop view T_count
>drop table T
>
>
>"Shawn Meyer" <smeyer@interelate.com> wrote in message
>news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl...
>
>
>>I have a table defined as :
>>
>>CREATE TABLE [dbo].[job_13_event] (
>> [id] [int] IDENTITY (1, 1) NOT NULL ,
>> [recipient_id] [int] NULL ,
>> [event_type] [int] NULL ,
>> [mail_type] [int] NULL ,
>>) ON [PRIMARY]
>>
>>There is about 5 million rows in this table;
>>1,063,337 have event_type=3;
>>
>>When I run the folowing statement it takes 13 seconds
>>to run. The execution plan shows a table spool/eager spool taking
>>32%.
>>
>>SELECT
>> count(recipient_id) mail_type_count,
>> count(distinct recipient_id) dist_count,
>> mail_type
>>from
>> job_13_event where event_type = 3
>>GROUP BY mail_type
>>
>>When I run the two counts as seperate statements, the count returns in 2
>>seconds
>>and the count distinct returns in 3.
>>
>>SELECT
>> count(recipient_id) mail_type_count,
>> mail_type
>>from
>> job_13_event where event_type = 3
>>GROUP BY mail_type
>>
>>SELECT
>> count(distinct recipient_id) dist_count,
>> mail_type
>>from
>> job_13_event where event_type = 3
>>GROUP BY mail_type
>>
>>If I do a join of the two statements and use FORCEPLAN,
>>I can get the results to come back in 7 Seconds. Here is the sql for
>>
>>
>that:
>
>
>>SET FORCEPLAN ON
>>select a.dist_count, a.mail_type, b.mail_type_count from
>>( SELECT
>> count(distinct recipient_id) dist_count,
>> mail_type
>> from job_13_event
>> where event_type = 3
>> GROUP BY mail_type) a,
>>( SELECT
>> count(1) mail_type_count,
>> mail_type
>> from job_13_event
>> where event_type = 3
>> GROUP BY mail_type ) b
>>where a.mail_type = b.mail_type
>>
>>I would think that the optimizer could calculate the first simple statment
>>by :
>>sorting by the mail_type and then recipient_id and doing a count for one
>>
>>
>and
>
>
>>a distinct count for the recipient_id
>>in one scan. In that case you would get the faster operation the count(1)
>>essentially for free, resulting in a near 4 second
>>result time.
>>
>>Is there something that I am missing that would make my first statement
>>
>>
>run
>
>
>>as fast or faster than using
>>the forceplan ???
>>
>>Thanks.
>>Shawn
>>
>>
>>
>>
>
>
>
>
- Next message: Anand: "Re: IF UPDATE (column) in a trigger"
- Previous message: Greg Obleshchuk: "Re: Case statements"
- In reply to: Uri Dimant: "Re: Counting with distinct - Optimizer Problems"
- Next in thread: Uri Dimant: "Re: Counting with distinct - Optimizer Problems"
- Reply: Uri Dimant: "Re: Counting with distinct - Optimizer Problems"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|