Re: Performance issue with CONTAINS



Simon, Hilary,

What is the difference between these two queries:
SELECT * FROM PERSON WHERE company_id = 7410 AND CONTAINS(first_name,
'Daniel')

SELECT person.* FROM CONTAINSTABLE(PERSON, first_name, 'Daniel') AS a
JOIN person ON a.key = person.person_id WHERE company_id = 7410

shouldn't they return the same results? Are they interchangeable? If
not, when should I use CONTAINS? When should I use CONTAINSTABLE?

Does the same phenomena happens with FREETEXT as well?

BTW, I'm assuming this thread applies to Sql Server 2005 here, right?

Thank you,
Tony.

Simon Sabin wrote:
Hello boblotz2001,

The lack of an ability to put the extra data into the index is a big pain
especially when the keyword is not very selective. Of the two options posed,
I would go for the addition of extra text in the search text i.e. COMP1234
It is more flexible and scalable. We implemented both and I am regretting
doing the partition one.

I woul always use containstable, I believe it simplifies the options the
optimiser has to consider.

You may want to look at SQL Turbo


Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons


Hillary,

Thanks for your reply... I am not sure if I will be able to implement
either solution. The first one is a bit drastic and carries a fair
amount of overhead with it. Besides, as you mentioned I do have a
good amount of companies. Also, doesn't sound like this change would
affect the execution plan. Index Server would still return a large
recordset and try to filter it down but now instead of millions of row
it would deal with thousands.

Second solution is also problematic because I gave company ID as an
example but there are other predicates that can be used in the query.
For example, the search might be zip_code and last name instead of
company_id and first_name. I don't want to have to include every
permutation of search criteria in the index.

I guess my point is why is this happening specifically? Is it the
size of the table, shortcoming of the technology, improper indexing on
the table, something else?

Thanks

Hilary Cotter wrote:

This is a well known phenomena known as trimming. The best way to
solve this is by creating an indexed view which contains the data you
are looking for, i.e. company_id=7410 or you partition your table
into different tables by company_id. For large numbers of company_id
this will not work.

Another option is to add company_id to your content and then search
like this

SELECT * FROM PERSON WHERE company_id = 7410 AND CONTAINS(first_name,
'Daniel' and '7410') takes 2.5 minutes.

--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"boblotz2001" <boblotz2001@xxxxxxxxx> wrote in message
news:1157486524.708356.37770@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Seemingly a simple thing but something is wrong:

SELECT * FROM PERSON WHERE company_id = 7410 AND
CONTAINS(first_name, 'Daniel') takes 2.5 minutes.

SELECT person.* FROM CONTAINSTABLE(PERSON, first_name, 'Daniel') AS
a JOIN person ON a.key = person.person_id WHERE company_id = 7410
takes under a second.

Person table has 1101665 Million rows. 3893 rows have company_id =
7410. 6836 rows contain 'Daniel'. Total of 32 rows are returned.

Catalog attributes:
Item Count: 1101665
Unique Key Count: 4271510
Catalog Size: 128 Mb
The second query runs as expected: person table performs an index
seek on the company_id index and returns 3893 rows, remote scan
returns 6836 rows and a Hash Match join is used to produce 32 rows.

The second query starts the same by using the company_id index and
returning 3893 rows but then things get ugly. Remote scan show
estimated number of rows as 6836 but returns 26,510,733 rows and a
filter is then applied to filter the number of rows to 32! Needless
to say that takes a while.

Need to add the fact that the catalog contains one table 'PERSON'
but
there are about
dozen columns in PERSON table being indexed.
Any thoughts?

Thanks


.



Relevant Pages

  • Re: Performance issue with CONTAINS
    ... Hilary Cotter wrote: ... BTW, I'm assuming this thread applies to Sql Server 2005 here, right? ... 'Daniel' and '7410') takes 2.5 minutes. ... The second query runs as expected: ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Performance issue with CONTAINS
    ... 'Daniel' and '7410') takes 2.5 minutes. ... Catalog Size: 128 Mb ... The second query runs as expected: person table performs an index seek ... filter is then applied to filter the number of rows to 32! ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Performance issue with CONTAINS
    ... The lack of an ability to put the extra data into the index is a big pain especially when the keyword is not very selective. ... CONTAINS(first_name, 'Daniel') takes 2.5 minutes. ... Catalog Size: 128 Mb ... The second query runs as expected: ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Performance issue with CONTAINS
    ... and try to filter it down but now instead of millions of row it would ... 'Daniel' and '7410') takes 2.5 minutes. ... Catalog Size: 128 Mb ... The second query runs as expected: person table performs an index seek ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Performance issue with CONTAINS
    ... It applies to SQL 2000 or SQL 2005. ... BTW, I'm assuming this thread applies to Sql Server 2005 here, right? ... 'Daniel' and '7410') takes 2.5 minutes. ... The second query runs as expected: ...
    (microsoft.public.sqlserver.fulltext)

Loading