Re: CONTAINS performance

From: malcolm.earnest_at_bom.co.uk ((malcolm.earnest_at_bom.co.uk))
Date: 10/29/04


Date: Fri, 29 Oct 2004 06:53:30 -0700

Thank you again for your time and effort… there was a lot of useful information there.

With respect to predicting scalability I understand your argument that without the actual data that there can be a whole host of hidden “gotcha” situations… but I really do need to estimate this issue of scalability despite an accurate simulation being all-but impossible. I’m familiar with the TPC-C; TPC-W and generic stress tests etc, and I’m aware that random test data can be generated from word-lists etc. – but experience has shown that even after much effort this data still exhibits statistical distributions. These techniques will likely prove useful – though I hope to only go to that level of effort once I’ve implemented a system I believe should scale well.

Regarding performance of the query itself, I recognise your concern with the computational expense of the remote scans of the free-text catalogue – however my own concerns are primarily with performance of the relational join in the context of the free-text optimisation. I know my query suggested a search for an ordinary dictionary word – but in practice I expect the facility to be used to search for complex associations between words – and/or unusual proper nouns or distinctive codes like ISBNs; car registration plates or Swiss bank account numbers! With this in mind I expect that each free-text search will identify only a few records – even if the tables contain many millions. If SQLServer performs the join operation for every record in the tables before restricting based upon the free-text criteria then (IMHO) this will likely prove a scalability issue. If SQLServer is able to determine a mechanism to read result-set rows more efficiently (i.e. without accessing the records themselves but rather by pe
rforming the join on the indices and comparing those key values with the free-text matches) then this will improve performance and make possible larger scale deployments. I suspect, however, that the root problem remains – i.e. that every primary key in A is considered to resolve the outer join… (which I suspect should be unnecessary.) While I also need to be concerned about the increasing cost of accessing the free-text indices – the fact that I expect typical search terms to be ‘well-chosen’ encourages me to consider the potential requirement to compute the whole outer join (even if this is just over indexes and not the larger records themselves) to be the most pressing concern.

I’ve been looking at execution plans, but should admit that so far I’m finding them difficult to interpret. For example, I have some data for a query similar to my original post (but where indices are arranged to be slightly more amenable to optimisation.) I’ve been analysing:

Select * from A left outer join B on A.id=B.id left outer join C on A.id=C.id
Where contains(A.*,’word’) or contains (B.*,’word’) or contains (C.*,’word’);

The wall-clock timings for this query are not particularly helpful – sometimes taking several seconds (presumably when indices are not cached) but with repeated execution executing quickly on successive evaluations. It is important to note that this test was performed with only a few hundred records – whereas I ultimately want to support thousands to millions, and that for my test all the relevant data could easily fit into RAM. At the end of this post is the output from showplan_all – about which I’d welcome any comments. Am I interpreting this plan correctly in assuming it states that the outer-join is computed considering at least the keys for every record in A?

Thanks for the pointer to the whitepaper on SQLServer 2005 – it was interesting – though I think it best to defer judgement on the performance improvements until I see them for my own queries and schema. The advances certainly appear to offer substantial benefits but I doubt I can be sure what it means for me until I find opportunity to at least trial a Beta.

SET SHOWPLAN_ALL ON 3 1 0 NULL NULL 1 NULL NULL NULL NULL NULL NULL NULL NULL SETON 0 NULL
 select * from A left outer join B on B.ID = A.ID left outer join C on C.ID = A.ID where contains(A.*, '"word"') or contains(B.*, '"word"') or 4 2 0 NULL NULL 2 NULL 549.56439 NULL NULL NULL 1.5461341 NULL NULL SELECT 0 NULL
  |--Compute Scalar(DEFINE:([A].[FC]=[A].[FC], [A].[FU]=[A].[FU], [B].[FU]=[B].[FU], [C].[FU]=[C].[FU])) 4 3 2 Compute Scalar Compute Scalar DEFINE:([A].[FC]=[A].[FC], [A].[FU]=[A].[FU], [B].[FU]=[B].[FU], [C].[FU]=[C].[FU]) [A].[FC]=[A].[FC], [A].[FU]=[A].[FU], [B].[FU]=[B].[FU], [C].[FU]=[C].[FU] 549.56439 0.0 5.4956436E-5 2728 1.5461341 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
       |--Filter(WHERE:(([Expr1012] OR [Expr1013]) OR [Expr1014])) 4 5 3 Filter Filter WHERE:(([Expr1012] OR [Expr1013]) OR [Expr1014]) NULL 549.56439 0.0 3.1874733E-4 2745 1.5460792 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
            |--Nested Loops(Left Semi Join, WHERE:([Expr1012] OR [Expr1013])OUTER REFERENCES:([C].[FO]), DEFINE:([Expr1014] = [PROBE VALUE])) 4 6 5 Nested Loops Left Semi Join WHERE:([Expr1012] OR [Expr1013])OUTER REFERENCES:([C].[FO]), DEFINE:([Expr1014] = [PROBE VALUE]) [Expr1014] = [PROBE VALUE] 549.56439 0.0 2.5524213E-3 2745 1.5457604 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
                 |--Nested Loops(Left Semi Join, WHERE:([Expr1012])OUTER REFERENCES:([B].[FS]), DEFINE:([Expr1013] = [PROBE VALUE])) 4 7 6 Nested Loops Left Semi Join WHERE:([Expr1012])OUTER REFERENCES:([B].[FS]), DEFINE:([Expr1013] = [PROBE VALUE]) [Expr1013] = [PROBE VALUE] 610.62708 0.0 2.8360237E-3 2745 1.1154289 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
                 | |--Nested Loops(Left Semi Join, OUTER REFERENCES:([A].[ID]), DEFINE:([Expr1012] = [PROBE VALUE])) 4 8 7 Nested Loops Left Semi Join OUTER REFERENCES:([A].[ID]), DEFINE:([Expr1012] = [PROBE VALUE]) [Expr1012] = [PROBE VALUE] 678.47455 0.0 3.1511374E-3 2745 0.68396938 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
                 | | |--Merge Join(Left Outer Join, MANY-TO-MANY MERGE:([A].[ID])=([B].[ID]), RESIDUAL:([B].[ID]=[A].[ID])) 4 9 8 Merge Join Left Outer Join MANY-TO-MANY MERGE:([A].[ID])=([B].[ID]), RESIDUAL:([B].[ID]=[A].[ID]) NULL 753.86066 0.04586786 1.9198203E-2 2744 0.26548451 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
                 | | | |--Merge Join(Left Outer Join, MERGE:([A].[ID])=([C].[ID]), RESIDUAL:([C].[ID]=[A].[ID])) 4 10 9 Merge Join Left Outer Join MERGE:([A].[ID])=([C].[ID]), RESIDUAL:([C].[ID]=[A].[ID]) NULL 424.98514 0.0 7.0256959E-3 2249 0.12597823 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
                 | | | | |--Clustered Index Scan(OBJECT:([Cat].[dbo].[A].[PK]), ORDERED FORWARD) 4 11 10 Clustered Index Scan Clustered Index Scan OBJECT:([Cat].[dbo].[A].[PK]), ORDERED FORWARD [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ 312.0 5.0171092E-2 4.2170001E-4 1233 5.0592791E-2 [A].[ID], [A].[FD], [A].[FB], [A].[FC], [A].[FA], [A].[FT], [A].[FN], [A].[FL], [A].[F_], [A].[FV], [A].[FI], [ NULL PLAN_ROW 0 1.0
                 | | | | |--Sort(ORDER BY:([C].[ID] ASC)) 4 12 10 Sort Sort ORDER BY:([C].[ID] ASC) NULL 321.0 1.1261261E-2 4.270569E-3 1026 6.8356745E-2 [C].[FO], [C].[FQ], [C].[ID], [C].[FG], [C].[FX], [C].[FP], [C].[FJ], [C].[FW], [C].[FU], [C].[FK], [C].[X_D NULL PLAN_ROW 0 1.0
                 | | | | |--Clustered Index Scan(OBJECT:([Cat].[dbo].[C].[PK])) 4 13 12 Clustered Index Scan Clustered Index Scan OBJECT:([Cat].[dbo].[C].[PK]) [C].[FO], [C].[FQ], [C].[ID], [C].[FG], [C].[FX], [C].[FP], [C].[FJ], [C].[FW], [C].[FU], [C].[FK], [C].[X_D 321.0 2.6196657E-2 0.0002158 1042 5.2824914E-2 [C].[FO], [C].[FQ], [C].[ID], [C].[FG], [C].[FX], [C].[FP], [C].[FJ], [C].[FW], [C].[FU], [C].[FK], [C].[X_D NULL PLAN_ROW 0 1.0
                 | | | |--Sort(ORDER BY:([B].[ID] ASC)) 4 17 9 Sort Sort ORDER BY:([B].[ID] ASC) NULL 714.0 1.1261261E-2 1.0659463E-2 504 7.4437201E-2 [B].[FS], [B].[ID], [B].[FM], [B].[FY], [B].[FZ], [B].[FH], [B].[FR], [B].[FE], [B].[FF], [B].[F_], [B].[FJ], [X NULL PLAN_ROW 0 1.0
                 | | | |--Clustered Index Scan(OBJECT:([Cat].[dbo].[B].[PK])) 4 18 17 Clustered Index Scan Clustered Index Scan OBJECT:([Cat].[dbo].[B].[PK]) [B].[FS], [B].[ID], [B].[FM], [B].[FY], [B].[FZ], [B].[FH], [B].[FR], [B].[FE], [B].[FF], [B].[F_], [B].[FJ], [X 714.0 5.1652573E-2 0.0008639 520 5.2516475E-2 [B].[FS], [B].[ID], [B].[FM], [B].[FY], [B].[FZ], [B].[FH], [B].[FR], [B].[FE], [B].[FF], [B].[F_], [B].[FJ], [X NULL PLAN_ROW 0 1.0
                 | | |--Row Count Spool 4 22 8 Row Count Spool Lazy Spool NULL NULL 1.0 0.0 0.0000001 4 0.41533375 NULL NULL PLAN_ROW 0 753.86066
                 | | |--Index Spool(SEEK:([FULLTEXT:A].[KEY]=[A].[ID])) 4 23 22 Index Spool Eager Spool SEEK:([FULLTEXT:A].[KEY]=[A].[ID]) NULL 1.0 2.9842013E-2 7.9600002E-5 11 0.41530594 NULL NULL PLAN_ROW 0 278.02261
                 | | |--Remote Scan(OBJECT:(CONTAINS)) 4 24 23 Remote Scan Remote Scan OBJECT:(CONTAINS) NULL 1000.0 0.0 0.36333334 15 0.36333334 [FULLTEXT:A].[KEY] NULL PLAN_ROW 0 1.0
                 | |--Row Count Spool 4 25 7 Row Count Spool Lazy Spool NULL NULL 1.0 0.0 1.7798983E-2 4 0.42855564 NULL NULL PLAN_ROW 0 678.47455
                 | |--Index Spool(SEEK:([FULLTEXT:B].[KEY]=[B].[FS])) 4 26 25 Index Spool Eager Spool SEEK:([FULLTEXT:B].[KEY]=[B].[FS]) NULL 1.0 2.9842013E-2 7.9600002E-5 11 0.40765724 NULL NULL PLAN_ROW 0 181.93309
                 | |--Remote Scan(OBJECT:(CONTAINS)) 4 27 26 Remote Scan Remote Scan OBJECT:(CONTAINS) NULL 1000.0 0.0 0.36333334 15 0.36333334 [FULLTEXT:B].[KEY] NULL PLAN_ROW 0 1.0
                 |--Row Count Spool 4 28 6 Row Count Spool Lazy Spool NULL NULL 1.0 0.0 0.01716849 4 0.42759585 NULL NULL PLAN_ROW 0 610.62708
                      |--Index Spool(SEEK:([FULLTEXT:C].[KEY]=[C].[FO])) 4 29 28 Index Spool Eager Spool SEEK:([FULLTEXT:C].[KEY]=[C].[FO]) NULL 1.0 2.9842013E-2 7.9600002E-5 11 0.40763831 NULL NULL PLAN_ROW 0 181.69568
                           |--Remote Scan(OBJECT:(CONTAINS)) 4 30 29 Remote Scan Remote Scan OBJECT:(CONTAINS) NULL 1000.0 0.0 0.36333334 15 0.36333334 [FULLTEXT:C].[KEY] NULL PLAN_ROW 0 1.0

select *
from A
left outer join B on B.ID = A.ID
left outer join C on C.ID = A.ID
where contains(A.*, '"word"')
or contains(B.*, '"word"')
or contains(C.*, '"word"')

**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...


Loading