Re: Performance issue with CONTAINS



Indexing Services as well as other search engines like Lucene allow you to
store data in the index so you can trim in the search engine and return
trimmed results. The problem is that the more data you store the slower your
searches. SQL FTS does not allow you to store data in the index and hence
you have to trim the results set as it comes back.

So other than partitioning or using a different search engine there is no
way around this.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



"boblotz2001" <boblotz2001@xxxxxxxxx> wrote in message
news:1157552635.152940.56430@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hillary,

Thanks for your reply... I am not sure if I will be able to implement
either solution. The first one is a bit drastic and carries a fair
amount of overhead with it. Besides, as you mentioned I do have a good
amount of companies. Also, doesn't sound like this change would affect
the execution plan. Index Server would still return a large recordset
and try to filter it down but now instead of millions of row it would
deal with thousands.

Second solution is also problematic because I gave company ID as an
example but there are other predicates that can be used in the query.
For example, the search might be zip_code and last name instead of
company_id and first_name. I don't want to have to include every
permutation of search criteria in the index.

I guess my point is why is this happening specifically? Is it the size
of the table, shortcoming of the technology, improper indexing on the
table, something else?

Thanks


Hilary Cotter wrote:
This is a well known phenomena known as trimming. The best way to solve
this
is by creating an indexed view which contains the data you are looking
for,
i.e. company_id=7410 or you partition your table into different tables by
company_id. For large numbers of company_id this will not work.

Another option is to add company_id to your content and then search like
this

SELECT * FROM PERSON WHERE company_id = 7410 AND CONTAINS(first_name,
'Daniel' and '7410') takes 2.5 minutes.


--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

"boblotz2001" <boblotz2001@xxxxxxxxx> wrote in message
news:1157486524.708356.37770@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Seemingly a simple thing but something is wrong:

SELECT * FROM PERSON WHERE company_id = 7410 AND CONTAINS(first_name,
'Daniel') takes 2.5 minutes.

SELECT person.* FROM CONTAINSTABLE(PERSON, first_name, 'Daniel') AS a
JOIN person ON a.key = person.person_id WHERE company_id = 7410 takes
under a second.

Person table has 1101665 Million rows. 3893 rows have company_id =
7410. 6836 rows contain 'Daniel'. Total of 32 rows are returned.

Catalog attributes:
Item Count: 1101665
Unique Key Count: 4271510
Catalog Size: 128 Mb

The second query runs as expected: person table performs an index seek
on the company_id index and returns 3893 rows, remote scan returns 6836
rows and a Hash Match join is used to produce 32 rows.

The second query starts the same by using the company_id index and
returning 3893 rows but then things get ugly. Remote scan show
estimated number of rows as 6836 but returns 26,510,733 rows and a
filter is then applied to filter the number of rows to 32! Needless to
say that takes a while.

Need to add the fact that the catalog contains one table 'PERSON' but
there are about
dozen columns in PERSON table being indexed.

Any thoughts?

Thanks




.


Loading