Re: Performance issue with CONTAINS
- From: "Hilary Cotter" <hilary.cotter@xxxxxxxxx>
- Date: Mon, 11 Sep 2006 13:47:56 -0400
Unfortunately it does.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
<tony.newsgrps@xxxxxxxxx> wrote in message
news:1157988051.295272.49260@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hilary,
Thank you for your answer.
Does the trimming issue exist with ContainsTable too? If I don't set
any range limit, does ContainsTable return all records matching the
searched words or only the top X results?
beginning of the thread are equivalent, so I'm confused about the wholeFrom what I understand the two queries boblotz2001 mentioned at the
discussion that followed about partitioning or using another search
engine.
Tony.
Hilary Cotter wrote:
It applies to SQL 2000 or SQL 2005.
Containstable has some additional features that Contains does not have.
One
of them is the ability to limit the range to the highest ranked X
results,
ie the first 100 results. Another is that it returns the rank value, and
you
can also use it to join on different tables which have a similar key.
ContainsTable returns faster results as well - mainly because only two
columns are being returned.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
<tony.newsgrps@xxxxxxxxx> wrote in message
news:1157931106.228834.276490@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Simon, Hilary,
What is the difference between these two queries:
SELECT * FROM PERSON WHERE company_id = 7410 AND CONTAINS(first_name,
'Daniel')
SELECT person.* FROM CONTAINSTABLE(PERSON, first_name, 'Daniel') AS a
JOIN person ON a.key = person.person_id WHERE company_id = 7410
shouldn't they return the same results? Are they interchangeable? If
not, when should I use CONTAINS? When should I use CONTAINSTABLE?
Does the same phenomena happens with FREETEXT as well?
BTW, I'm assuming this thread applies to Sql Server 2005 here, right?
Thank you,
Tony.
Simon Sabin wrote:
Hello boblotz2001,
The lack of an ability to put the extra data into the index is a big
pain
especially when the keyword is not very selective. Of the two options
posed,
I would go for the addition of extra text in the search text i.e.
COMP1234
It is more flexible and scalable. We implemented both and I am
regretting
doing the partition one.
I woul always use containstable, I believe it simplifies the options
the
optimiser has to consider.
You may want to look at SQL Turbo
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons
Hillary,
Thanks for your reply... I am not sure if I will be able to
implement
either solution. The first one is a bit drastic and carries a fair
amount of overhead with it. Besides, as you mentioned I do have a
good amount of companies. Also, doesn't sound like this change
would
affect the execution plan. Index Server would still return a large
recordset and try to filter it down but now instead of millions of
row
it would deal with thousands.
Second solution is also problematic because I gave company ID as an
example but there are other predicates that can be used in the
query.
For example, the search might be zip_code and last name instead of
company_id and first_name. I don't want to have to include every
permutation of search criteria in the index.
I guess my point is why is this happening specifically? Is it the
size of the table, shortcoming of the technology, improper indexing
on
the table, something else?
Thanks
Hilary Cotter wrote:
This is a well known phenomena known as trimming. The best way to
solve this is by creating an indexed view which contains the data
you
are looking for, i.e. company_id=7410 or you partition your table
into different tables by company_id. For large numbers of
company_id
this will not work.
Another option is to add company_id to your content and then search
like this
SELECT * FROM PERSON WHERE company_id = 7410 AND
CONTAINS(first_name,
'Daniel' and '7410') takes 2.5 minutes.
--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"boblotz2001" <boblotz2001@xxxxxxxxx> wrote in message
news:1157486524.708356.37770@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Seemingly a simple thing but something is wrong:
SELECT * FROM PERSON WHERE company_id = 7410 AND
CONTAINS(first_name, 'Daniel') takes 2.5 minutes.
SELECT person.* FROM CONTAINSTABLE(PERSON, first_name, 'Daniel')
AS
a JOIN person ON a.key = person.person_id WHERE company_id = 7410
takes under a second.
Person table has 1101665 Million rows. 3893 rows have company_id =
7410. 6836 rows contain 'Daniel'. Total of 32 rows are returned.
Catalog attributes:
Item Count: 1101665
Unique Key Count: 4271510
Catalog Size: 128 Mb
The second query runs as expected: person table performs an index
seek on the company_id index and returns 3893 rows, remote scan
returns 6836 rows and a Hash Match join is used to produce 32
rows.
The second query starts the same by using the company_id index and
returning 3893 rows but then things get ugly. Remote scan show
estimated number of rows as 6836 but returns 26,510,733 rows and a
filter is then applied to filter the number of rows to 32!
Needless
to say that takes a while.
Need to add the fact that the catalog contains one table 'PERSON'
but
there are about
dozen columns in PERSON table being indexed.
Any thoughts?
Thanks
.
- References:
- Re: Performance issue with CONTAINS
- From: boblotz2001
- Re: Performance issue with CONTAINS
- From: Simon Sabin
- Re: Performance issue with CONTAINS
- From: tony . newsgrps
- Re: Performance issue with CONTAINS
- From: Hilary Cotter
- Re: Performance issue with CONTAINS
- From: tony . newsgrps
- Re: Performance issue with CONTAINS
- Prev by Date: Re: Must a hyphen always be a wordbreaker?
- Next by Date: Re: Must a hyphen always be a wordbreaker?
- Previous by thread: Re: Performance issue with CONTAINS
- Next by thread: Re: What characters are considered as word boudaries
- Index(es):
Relevant Pages
|
Loading