Re: Fulltext search too slow
- From: DC <dc@xxxxxxxxx>
- Date: Thu, 02 Aug 2007 09:36:01 -0700
On 2 Aug., 17:27, "Daniel Crichton" <msn...@xxxxxxxxxxxxxxxx> wrote:
DC wrote on Thu, 02 Aug 2007 06:57:59 -0700:
Hi,
I am using an FT index on a column (varchar(4000) with an average
length of 100) in a table that contains about 1 million rows.
My tiny performance test looks like
declare @t datetime set @t = getdate()
select count(*) from containstable(table_name, expression,
'"searchthis*"')
print datediff (ms, @t, getdate())
If "searchthis" matches about 1000 rows, the search will usually take
about 2 seconds. But when it matches a lot of rows (some terms will
match up to 200.000 rows) then the search takes up to one minute. With
5.000 matches the search already takes about 15 seconds. It helps a bit
to restrict the ft search to Top-20000-of-rows but not enough and
I need many results (not to display them but to display how many
results of what type were found).
This is too slow for my application and I am sure the performance has
degraded for some reason. I am rebuilding the ft index every week and
reorganizing it every night. CPU and disk queues look relaxed so there
might be a less apparent bottleneck.
Does someone know where I should start looking for an enhancement, or
are these very rough performance figures normal for a 4 * 3 GHz
machine?
Thanks for any hint in advance,
Regards
DC
There are many factors that will affect the speed of your queries, so it's
hard to recommend any one thing to look for.
I've got a 2 * 3 Ghz server here that I run my websites on, with a product
database of around 750000 items. For searches with small numbers of results
(less than 5000 for instance), it runs pretty quickly and there is little
difference between using a wildcard and not. Trying to find a word that will
give me around 20000 results without a wildcard, and maybe double that with
one, is going to take me a while to nail down so I can't really run any sort
of comparison here myself, but I can tell you that if I search for "the" it
takes 28 seconds to return just over 300,000 results, and with "the*" it
takes only 13 secs to bring back just over 320,000 results. However, this
was after some intial runs so that data is cached - with uncached data it
takes 4 or 5 times as long. Of course, searching for "the" followed by
"the*" is not a great test, as some of the data from the first search will
remain cached for the second one to use, but with repeated runs of just
"the" I seem to get no improvement beyond 25 seconds or so, yet "the*" drops
from 13 secs to 6 secs with repeats.
You'll have to do some extensive testing to work out how to improve the
result speed, but the obvious one is to limit the number of results as far
as possible, and to try to only allow searching for words that will return
few results.
Given the results I've found in my own database I'm going to run some tests
for actual searches customers have entered and investigate whether
wildcarding them will in general reduce the search times (obviously at the
expensive of less exact matches being returned), however our average search
response time is under a second for customer entered searches anyway so it
may just be a waste of time for me (it's rare that a customer will enter a
single common word like "the" or "of" to search for, they tend to enter a
few words that return few matches).
Dan- Zitierten Text ausblenden -
- Zitierten Text anzeigen -
Thank you for sharing all this info, Dan!
It pretty much looks like I will not be getting much more performance
from the build in ft engine. With SQL Server 2000 I have been using
SQL Turbo, which was a lot faster and had great ranking features - but
it was poorly supported so I was happy to get away from it thinking
that MS finally found out how design a solid fulltext search - which
apparently they have not. Luckily I am not fully relying ft index, I
am also caching search terms and their results. I am using this for a
security paper database so the searches repeat often and most of the
hits can be answered without the ft index.
I ran the litte test script attached to get a better feel for the
dependency of search expression, number of hits and latency fo the
search.
expression: "bank*", hits: 177558,
elapsedms: 20746
expression: "oil*", hits: 5405,
elapsedms: 360
expression: "citibank*", hits: 20478,
elapsedms: 296
expression: "daimler*", hits: 11294,
elapsedms: 220
expression: "daimlerc*", hits: 10815,
elapsedms: 110
expression: "daimlerch*", hits: 10813,
elapsedms: 106
expression: "daimlerchr*", hits: 10807,
elapsedms: 96
That was uncached. Why did "bank*" take so much longer compared to
"citibank*"? Next run:
expression: "bank*", hits: 177558,
elapsedms: 1126
expression: "oil*", hits: 5405,
elapsedms: 46
expression: "citibank*", hits: 20478,
elapsedms: 126
expression: "daimler*", hits: 11294,
elapsedms: 106
expression: "daimlerc*", hits: 10815,
elapsedms: 93
expression: "daimlerch*", hits: 10813,
elapsedms: 80
expression: "daimlerchr*", hits: 10807,
elapsedms: 76
That of course looks better. Subsequent runs are about as fast but of
course SQL Server will wipe the cache rather soon. Another example,
uncached:
expression: "Stock*", hits: 104299,
elapsedms: 1233
expression: "Mineral*", hits: 580,
elapsedms: 110
expression: "Oat*", hits: 241,
elapsedms: 30
expression: "AG*", hits: 172584,
elapsedms: 2016
expression: "Corp*", hits: 10725,
elapsedms: 123
expression: "Option*", hits: 418614,
elapsedms: 7503
expression: "Amazon*", hits: 1104,
elapsedms: 13
Hm. This one was fine too, and the expressions were not cached. What
was the problem with the initial "bank*"?
I read something about the ft engine trying to connect to the internet
for it's wordbreaker, maybe that has to do with it? I will check.
Regards
DC
set nocount on
declare @t table (ex varchar(31));
insert into @t (ex) values ('"bank*"');
insert into @t (ex) values ('"Mineral*"');
insert into @t (ex) values ('"Oat*"');
insert into @t (ex) values ('"AG*"');
insert into @t (ex) values ('"Corp*"');
insert into @t (ex) values ('"Option*"');
insert into @t (ex) values ('"Amazon*"');
declare
@ex varchar(31),
@st datetime,
@hits int
declare c cursor for
select ex from @t;
open c;
fetch next from c into @ex;
while @@fetch_status = 0
begin
set @st = getdate();
select @hits = count(*) from containstable(tbl_instruments_ft,
expression, @ex);
'expression: '
+ convert(varchar, @ex)
+ ', ' + space(31 - len(@ex)) + 'hits: '
+ convert(varchar, @hits)
+ ', ' + space(10 - len(convert(varchar, @hits))) + 'elapsedms: '
+ convert(varchar, datediff (ms, @st, getdate()));
fetch next from c into @ex;
end
close c
deallocate c
.
- Follow-Ups:
- Re: Fulltext search too slow
- From: Daniel Crichton
- Re: Fulltext search too slow
- From: Simon Sabin
- Re: Fulltext search too slow
- References:
- Fulltext search too slow
- From: DC
- Re: Fulltext search too slow
- From: Daniel Crichton
- Fulltext search too slow
- Prev by Date: Re: Fulltext search too slow
- Next by Date: Re: Full Text catalog not working
- Previous by thread: Re: Fulltext search too slow
- Next by thread: Re: Fulltext search too slow
- Index(es):
Loading