Left Outer Join: Index Seek not providing all index columns
From: Mark Andersen (markandersen_at_evare.com)
Date: 10/28/04
- Next message: Arzan: "Spaceused by each index."
- Previous message: jonjo: "stored procedure"
- Messages sorted by: [ date ] [ thread ]
Date: 28 Oct 2004 09:16:46 -0700
We have a left outer join query which is intended to prove
non-existence of a row in a related table. We are supplying 6 columns
to match the 6 keys of the index, and then checking in the where
clause that one of the 6 columns (a non-nullable column) is null to
prove non-existence.
The wrinkle is that the SQL Server 2000 query optimizer is generally
deciding to provide 5 of the columns (1-4 and 6) for the index seek,
getting back a bunch of rows from the dependent table, and then
separately executing a where condition on column 5. This makes no
sense.
As you will see below, a hack can be done to force the optimizer to
supply all 6 columns. But I would like to understand whether the
query analyzer is behaving well and whether this is a bug. Also, if
there is something to be improved in the way in which the
non-existence of related rows is checked, then perhaps that should be
done. (We've tried where not exists but with limited success).
====================
DDL:
CREATE TABLE [dbo].[EntExtractItems] (
[ExtractType] [int] NOT NULL ,
[EntID] [tEntID] NOT NULL ,
[PrivActID] [tPrivActID] NOT NULL ,
[IntKey1] [int] NOT NULL ,
[IntKey2] [int] NOT NULL ,
[IntKey3] [int] NOT NULL ,
[ExtractBatchID] [varchar] (32) COLLATE SQL_Latin1_General_CP1_CI_AS
NOT NULL ,
[msrepl_tran_version] [uniqueidentifier] NOT NULL
) ON [PRIMARY]
ALTER TABLE [dbo].[EntExtractItems] ADD
CONSTRAINT [DF__EntExtrac__msrep__2F869875] DEFAULT (newid()) FOR
[msrepl_tran_version],
CONSTRAINT [UPKNCL_EntExtractItemsIdx] PRIMARY KEY NONCLUSTERED
(
[ExtractType],
[EntID],
[PrivActID],
[IntKey1],
[IntKey2],
[IntKey3]
) ON [PRIMARY]
GO
CREATE INDEX [IX_EntExtractItems] ON
[dbo].[EntExtractItems]([ExtractBatchID]) ON [PRIMARY]
GO
<and>
CustodyTrustTxns..transactions (removed from post)
ALTER TABLE [dbo].[Transactions] ADD
CONSTRAINT [DF__Transacti__UpdDa__0F975522] DEFAULT (getdate()) FOR
[UpdDateTime],
CONSTRAINT [DF__Transacti__msrep__4B2D1C3C] DEFAULT (newid()) FOR
[msrepl_tran_version],
CONSTRAINT [UPKNCL_TransactionsIdx1] PRIMARY KEY NONCLUSTERED
(
[SeqNo]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_TransactionsSec] ON
[dbo].[Transactions]([EvSecCurrID]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_Transactions] ON
[dbo].[Transactions]([PositionType], [AcctShortID], [SettleDate],
[TransactionType]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_Transactions2] ON
[dbo].[Transactions]([PositionType], [AcctShortID], [StmtMonth],
[AsOfDate]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_TransactionsSD] ON
[dbo].[Transactions]([SettleDate]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_TransactionsAOD] ON
[dbo].[Transactions]([AsOfDate]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_TransactionsInst] ON [dbo].[Transactions]([InstID])
WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IX_TransactionsTT] ON
[dbo].[Transactions]([TransactionType]) WITH FILLFACTOR = 90 ON
[PRIMARY]
GO
CREATE INDEX [IX_TransactionsLoad] ON
[dbo].[Transactions]([LoadBatchID]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
---------------------------------
Running this query:
select * from custodytrusttxns..transactions t
left outer join entextractitems eei
on eei.entid=1463
and eei.privactid=2041
and eei.extracttype=1
and eei.intkey1=-101
and eei.intkey2=t.seqno
and eei.intkey3=-1
where eei.entid is null
and t.positiontype=1 and t.tradedate = '08/25/2004' and t.settledate
is null
and t.asofdate = '08/27/2004' and t.stmtmonth='200408'
Resulted in this plan:
|--Filter(WHERE:([eei].[EntID]=NULL))
|--Nested Loops(Left Outer Join,
WHERE:([t].[SeqNo]=[eei].[IntKey2]))
|--Filter(WHERE:((([t].[StmtMonth]='200408' AND
[t].[TradeDate]='08/25/2004') AND [t].[SettleDate]=NULL) AND
[t].[PositionType]=1))
| |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([CustodyTrustTxns].[dbo].[Transactions] AS [t]))
| |--Index
Seek(OBJECT:([CustodyTrustTxns].[dbo].[Transactions].[IX_TransactionsAOD]
AS [t]), SEEK:([t].[AsOfDate]='08/27/2004') ORDERED FORWARD)
|--Bookmark Lookup(BOOKMARK:([Bmk1001]),
OBJECT:([Infrastructure].[dbo].[EntExtractItems] AS [eei]))
|--Index
Seek(OBJECT:([Infrastructure].[dbo].[EntExtractItems].[UPKNCL_EntExtractItemsIdx]
AS [eei]), SEEK:([eei].[ExtractType]=1 AND [eei].[EntID]=1463 AND
[eei].[PrivActID]=2041 AND [eei].[IntKey1]=-101),
WHERE:([eei].[IntKey3]=-1) ORDERED FORWARD)
************************************
Two things are weird here:
a. There is a bookmark lookup on EntExtractItems. There should not be
as the index covers the query
b. The Index Seek on EntExtractItems is not supplied with IntKey2.
This results in numerous rows being returned from EntExtractItems.
Then the nested loop filters them out, but alas, that is after a lot
of cycles have been wasted. ***This is a big deal***
As an aside, I should note that the entire left outer join is
dismissed by the cost optimizer as not costly, although it does amount
to 30% or more of the actual query time when buffers are flushed. (Is
it possible the query optimizer does not think too hard about
sub-problems which it deems small.)
I have experimented with the query in various ways:
For example, the following query (suggested by Dan Tow's nice book on
SQL Tuning) causes the index seek to behave correctly:
select * from custodytrusttxns..transactions t
left outer join entextractitems eei
on eei.entid=1463+t.seqno*0
and eei.privactid=2041+t.seqno*0
and eei.extracttype=1+t.seqno*0
and eei.intkey1=-101+t.seqno*0
and eei.intkey2=t.seqno
and eei.intkey3=-1+t.seqno*0
where eei.entid is null
and t.positiontype=1 and t.tradedate = '08/25/2004' and t.settledate
is null
and t.asofdate = '08/27/2004' and t.stmtmonth='200408'
(For those not in the know, what I have done exchanged all the static
values for dynamic values which are identical.)
|--Filter(WHERE:([eei].[EntID]=NULL))
|--Bookmark Lookup(BOOKMARK:([Bmk1001]),
OBJECT:([Infrastructure].[dbo].[EntExtractItems] AS [eei]))
|--Nested Loops(Left Outer Join, OUTER
REFERENCES:([t].[SeqNo]))
|--Filter(WHERE:((([t].[StmtMonth]='200408' AND
[t].[TradeDate]='08/25/2004') AND [t].[SettleDate]=NULL) AND
[t].[PositionType]=1))
| |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([CustodyTrustTxns].[dbo].[Transactions] AS [t]))
| |--Index
Seek(OBJECT:([CustodyTrustTxns].[dbo].[Transactions].[IX_TransactionsAOD]
AS [t]), SEEK:([t].[AsOfDate]='08/27/2004') ORDERED FORWARD)
|--Index
Seek(OBJECT:([Infrastructure].[dbo].[EntExtractItems].[UPKNCL_EntExtractItemsIdx]
AS [eei]), SEEK:([eei].[ExtractType]=1+[t].[SeqNo]*0 AND
[eei].[EntID]=1463+[t].[SeqNo]*0 AND
[eei].[PrivActID]=2041+[t].[SeqNo]*0 AND
[eei].[IntKey1]=-101+[t].[SeqNo]*0 AND [eei].[IntKey2]=[t].[SeqNo] AND
[eei].[IntKey3]=-1+[t].[SeqNo]*0) ORDERED FORWARD)
This improves things considerably. IntKey2 is now provided by the
optimizer to the Index Seek.
It makes sense that I can trick the optimizer. I have not been able
to figure out why it does not supply the dynamic column itself. Is it
confused about having a 6 column index? Or are dynamic columns
generally disadvantaged in this sort of way?
I updated statistics, and checked:
SELECT 'Index Name' = i.name,
'Statistics Date' = STATS_DATE(i.id, i.indid)
FROM sysobjects o, sysindexes i
WHERE o.name = 'entextractitems' AND o.id = i.id
EntExtractItems NULL
IX_EntExtractItems 2004-10-15 11:08:17.683
_WA_Sys_PrivActID_54575F1A 2004-10-15 11:08:22.387
_WA_Sys_EntID_54575F1A 2004-10-15 11:08:22.543
_WA_Sys_IntKey1_54575F1A 2004-10-15 11:08:22.700
_WA_Sys_IntKey3_54575F1A 2004-10-15 11:08:22.903
_WA_Sys_IntKey2_54575F1A 2004-10-15 11:08:23.077
UPKNCL_EntExtractItemsIdx 2004-10-15 11:08:23.683
_WA_Sys_msrepl_tran_version_54575F1A 2004-05-19 16:24:05.830
Re-ran the queries. Same results.
Let's consider what happens when the # of dynamic columns changes:
select * from custodytrusttxns..transactions t
left outer join entextractitems eei
on
eei.intkey2=t.seqno
and eei.intkey1=-101+0*t.seqno
and eei.intkey3=-1+0*t.seqno
and eei.entid=1463
and eei.privactid=2041
and eei.extracttype=1
where eei.entid is null
and t.positiontype=1 and t.tradedate = '08/25/2004' and t.settledate
is null
and t.asofdate = '08/27/2004' and t.stmtmonth='200408'
Here I have 3 "dynamic columns" and 3 static columns. The static
columns are the leading columns on the index.
|--Filter(WHERE:([eei].[EntID]=NULL))
|--Nested Loops(Left Outer Join,
WHERE:(([t].[SeqNo]=[eei].[IntKey2] AND
[eei].[IntKey1]=-101+0*[t].[SeqNo]) AND
[eei].[IntKey3]=-1+0*[t].[SeqNo]))
|--Filter(WHERE:((([t].[StmtMonth]='200408' AND
[t].[TradeDate]='08/25/2004') AND [t].[SettleDate]=NULL) AND
[t].[PositionType]=1))
| |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([CustodyTrustTxns].[dbo].[Transactions] AS [t]))
| |--Index
Seek(OBJECT:([CustodyTrustTxns].[dbo].[Transactions].[IX_TransactionsAOD]
AS [t]), SEEK:([t].[AsOfDate]='08/27/2004') ORDERED FORWARD)
|--Bookmark Lookup(BOOKMARK:([Bmk1001]),
OBJECT:([Infrastructure].[dbo].[EntExtractItems] AS [eei]))
|--Index
Seek(OBJECT:([Infrastructure].[dbo].[EntExtractItems].[UPKNCL_EntExtractItemsIdx]
AS [eei]), SEEK:([eei].[ExtractType]=1 AND [eei].[EntID]=1463 AND
[eei].[PrivActID]=2041) ORDERED FORWARD)
So, here the dynamic columns all fail to partipate in the index seek.
select * from custodytrusttxns..transactions t
left outer join entextractitems eei
on
eei.intkey2=t.seqno
and eei.intkey1=-101+0*t.seqno
and eei.intkey3=-1+0*t.seqno
and eei.entid=1463+0*t.seqno
and eei.privactid=2041+0*t.seqno
and eei.extracttype=1
where eei.entid is null
and t.positiontype=1 and t.tradedate = '08/25/2004' and t.settledate
is null
and t.asofdate = '08/27/2004' and t.stmtmonth='200408'
Here, I only include the first column in the index statically
|--Filter(WHERE:([eei].[EntID]=NULL))
|--Bookmark Lookup(BOOKMARK:([Bmk1001]),
OBJECT:([Infrastructure].[dbo].[EntExtractItems] AS [eei]))
|--Nested Loops(Left Outer Join, OUTER
REFERENCES:([t].[SeqNo]))
|--Filter(WHERE:((([t].[StmtMonth]='200408' AND
[t].[TradeDate]='08/25/2004') AND [t].[SettleDate]=NULL) AND
[t].[PositionType]=1))
| |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([CustodyTrustTxns].[dbo].[Transactions] AS [t]))
| |--Index
Seek(OBJECT:([CustodyTrustTxns].[dbo].[Transactions].[IX_TransactionsAOD]
AS [t]), SEEK:([t].[AsOfDate]='08/27/2004') ORDERED FORWARD)
|--Index
Seek(OBJECT:([Infrastructure].[dbo].[EntExtractItems].[UPKNCL_EntExtractItemsIdx]
AS [eei]), SEEK:([eei].[ExtractType]=1 AND
[eei].[EntID]=1463+0*[t].[SeqNo] AND
[eei].[PrivActID]=2041+0*[t].[SeqNo] AND
[eei].[IntKey1]=-101+0*[t].[SeqNo] AND [eei].[IntKey2]=[t].[SeqNo] AND
[eei].[IntKey3]=-1+0*[t].[SeqNo]) ORDERED FORWARD)
That is enough for the optimizer to provide all columns to the index
seek.
select * from custodytrusttxns..transactions t
left outer join entextractitems eei
on
eei.intkey2=t.seqno
and eei.intkey1=-101+0*t.seqno
and eei.intkey3=-1+0*t.seqno
and eei.entid=1463
and eei.privactid=2041+0*t.seqno
and eei.extracttype=1
where eei.entid is null
and t.positiontype=1 and t.tradedate = '08/25/2004' and t.settledate
is null
and t.asofdate = '08/27/2004' and t.stmtmonth='200408'
Here, the leading two columns of the index are static.
|--Filter(WHERE:([eei].[EntID]=NULL))
|--Nested Loops(Left Outer Join,
WHERE:((([t].[SeqNo]=[eei].[IntKey2] AND
[eei].[IntKey1]=-101+0*[t].[SeqNo]) AND
[eei].[IntKey3]=-1+0*[t].[SeqNo]) AND
[eei].[PrivActID]=2041+0*[t].[SeqNo]))
|--Filter(WHERE:((([t].[StmtMonth]='200408' AND
[t].[TradeDate]='08/25/2004') AND [t].[SettleDate]=NULL) AND
[t].[PositionType]=1))
| |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([CustodyTrustTxns].[dbo].[Transactions] AS [t]))
| |--Index
Seek(OBJECT:([CustodyTrustTxns].[dbo].[Transactions].[IX_TransactionsAOD]
AS [t]), SEEK:([t].[AsOfDate]='08/27/2004') ORDERED FORWARD)
|--Bookmark Lookup(BOOKMARK:([Bmk1001]),
OBJECT:([Infrastructure].[dbo].[EntExtractItems] AS [eei]))
|--Index
Seek(OBJECT:([Infrastructure].[dbo].[EntExtractItems].[UPKNCL_EntExtractItemsIdx]
AS [eei]), SEEK:([eei].[ExtractType]=1 AND [eei].[EntID]=1463) ORDERED
FORWARD)
In this case, only the first two are provided to the index seek.
Couldn't the optimizer provide all columns to the index seek with
little additional effort? Is it seeing enough selectivity at two
columns that it gives up providing at that point?
Just for fun, I tried the "where not exists" version:
select * from custodytrusttxns..transactions t
where not exists (select * from entextractitems eei where
eei.intkey2=t.seqno
and eei.intkey1=-101
and eei.intkey3=-1
and eei.entid=1463
and eei.privactid=2041
and eei.extracttype=1)
and t.positiontype=1 and t.tradedate = '08/25/2004' and t.settledate
is null
and t.asofdate = '08/27/2004' and t.stmtmonth='200408'
|--Nested Loops(Left Anti Semi Join,
WHERE:([eei].[IntKey2]=[t].[SeqNo]))
|--Filter(WHERE:((([t].[StmtMonth]='200408' AND
[t].[TradeDate]='08/25/2004') AND [t].[SettleDate]=NULL) AND
[t].[PositionType]=1))
| |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([CustodyTrustTxns].[dbo].[Transactions] AS [t]))
| |--Index
Seek(OBJECT:([CustodyTrustTxns].[dbo].[Transactions].[IX_TransactionsAOD]
AS [t]), SEEK:([t].[AsOfDate]='08/27/2004') ORDERED FORWARD)
|--Index Seek(OBJECT:([Infrastructure].[dbo].[EntExtractItems].[UPKNCL_EntExtractItemsIdx]
AS [eei]), SEEK:([eei].[ExtractType]=1 AND [eei].[EntID]=1463 AND
[eei].[PrivActID]=2041 AND [eei].[IntKey1]=-101),
WHERE:([eei].[IntKey3]=-1) ORDERED FORWARD)
You will note that the bookmark lookup disappears, but the problem of
having the nested loop test the where clause rather than supplying it
to the index seek, persists. Here we get the "Left Anti Semi Join"
instead of the "Left Outer Join".
So, my questions are in short:
1. Why the bookmark lookup?
2. Why does the index seek not provide the dynamic column?
3. Is there something we are doing that is wrong or is this to be
expected when one mixes dynamic and static columns?
- Next message: Arzan: "Spaceused by each index."
- Previous message: jonjo: "stored procedure"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|