Re: FTS Performance in SQL 2005

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Sorry, here's the DDL
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[fulltexttable](
[id] [int] IDENTITY(1,1) NOT NULL,
/* primary fulltext key */
[updID] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
/* used to update the fulltexttable */
[srcID] [int] NOT NULL,
/* source article id */
[srchField] [varchar](150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
/* keyword field */
[src] [varchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
/* source description */
[updateStampRemote] [binary](50) NULL,
/* timestamp remote field */
[updateStamp] [timestamp] NOT NULL,
/* local timestamp for change tracking */
CONSTRAINT [PK_fulltexttable] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
CREATE STATISTICS [_dta_stat_1582172932_1_2_3] ON
[dbo].[fulltexttable]([id], [src], [srcID])
GO
CREATE STATISTICS [_dta_stat_1582172932_3_1] ON
[dbo].[fulltexttable]([srcID], [id])
GO
CREATE STATISTICS [_dta_stat_1582172932_3_5_1] ON
[dbo].[fulltexttable]([srcID], [src], [id])

Indexes:
CREATE NONCLUSTERED INDEX [IX_fulltexttable] ON [dbo].[fulltexttable]
(
[srcID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF) ON [PRIMARY]

CREATE UNIQUE NONCLUSTERED INDEX [IX_fulltexttable_1] ON
[dbo].[fulltexttable]
(
[src] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_fulltexttable_2] ON [dbo].[fulltexttable]

(
[srchField] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF) ON [PRIMARY]

CREATE UNIQUE NONCLUSTERED INDEX [IX_fulltexttable_3] ON
[dbo].[fulltexttable]
(
[id] ASC,
[srcID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF) ON [PRIMARY]

ALTER TABLE [dbo].[fulltexttable] ADD CONSTRAINT [PK_fulltexttable]
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON
[PRIMARY]

KaMa schrieb:

Full-Text Table:
id - int - primary key - clustered (FULL TEXT KEY)
srcID - varchar(30) (LINKING KEY TO ARTICLE)
srchField - varchar(150) - FULL-TEXT INDEXED
src - varchar(15) (SOURCE OF KEYWORD)
updateStamp - timestamp (LOCAL TIMESTAMP)

Indexes (fields / description)
0 - [id] - Primary Key - unique / clustered
1 - [id], [srcID] - unique / nonclustered
2 - [srchField] - non-unique / nonclustered
3 - [srcID] - unique / nonclustered
4 - [id] - non-unique / nonclustered

Index in use when using a CONTAINS or CONTAINSTABLE (1).

I can't imagine that the memory is a bottleneck, because I do not see
any change in memory usage when I use a fulltext query...
Do you think i should reduce the max memory of sql server to <3.5 GB
(currently set to 3.5 GB of 4 GB)?

Another question - what means DDL? :)

Thanks!

Simon Sabin schrieb:

The artilce title is won't help eiother as the full text engine doesn't return
this.

The reason that the CPU usage is like that is probably because you bottleneck
is memory.

FTS returns all the keys that match the criteria it will then join to your
main table for each key returned, if the indexing is not right then you will
end up reading all the data from the main table. Whilst this may be cached
it still has to be processed, if its not cached it will be being read from
disk which is worse.

Your only option is to ensure that you are only returning the records you
need from the FT engine. i.e all criteria is passed. and that you have a
covering index on the columns in the query.

Can you please post the DDL for the table and the indexes on the table.


Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons


Hello Dan,
I know that the upper queries are mostly extremes, but I don't want do
construct a code wich can run into timeouts or searches which take ~
25-40 seconds - even in the hardest cases...
The problem with partitioning is that I need to give different ways of
ordering top returned articles (e.g. views, date..) and so i cannot do
several full-text catalogs wich access different tables...

The problem with the ranking-order is that none of my order-clauses
use the searched fields itself (except ordering for the article
title), and i've tried CONTAINSTABLE with TOP_N_RANK with some tests
and got completely wrong results...


No partial search - here the problem is that I have some words which
occur several thousand times, the fulltexttable is constructed like a
keyword list - so if someone searched for a keyword that occurs
several times - the problem would occur again...

I've already implemented a caching scenario to the whole thing which
caches statistics, counts and results of the search...

Do you by any chance know how to give the SQL-FT service more
priority? Because what really buggers me is that if I execute a big
query on FTS, almost no CPU/Memory is used by the service at all - I
didn't find any documentation on the memory behaviour of SQL2005/FT
compared to SQL2000/FT - because in SQL2000/FT you had to fix the
maximum of ram SQL uses to give FT more memory...

I've already tried to create a multiple-column scenario, but this
didn't work out due to following problem...

id | srcID | title | alternate_title | keywords | names

Now if i've a field 'names' containing - let's say - following data:
KaMa, Daniel Chrichton, Luis Mortan

Now if I search for 'Daniel Mortan' - the record would appear even if
the data is not correct...
This way i could reduce the number of records returned, but still the
problem persists that this wouldn't be scalable, because if some
keywords would appear in a lot of records - there still are a lot of
records to return :).
Daniel Crichton schrieb:

KaMa wrote on 18 Jul 2006 01:46:46 -0700:

Hi Dan - Thanks for answering!

I've got to pulling the KEY directly from the fulltexttable because
in
cases of large results this seems to be faster.
Currently I have a solution (still to slow) wich does the following
(shortened version):
INSERT INTO tmpTable(uid, id) SELECT @uid, [KEY] FROM
CONTAINSTABLE(fulltexttable,srchField,'"an*"')
then...
INSERT INTO tmpTable2(uid, srcID) SELECT @uid, srcID FROM
fulltexttable
AS tbl INNER JOIN tmpTable AS res ON ft.id=res.ID GROUP BY srcID
I'm aware that this is a step too much, but on large recordsets it
seems to be faster.

Seems strange if it is, as you're doing pretty much what a straight
CONTAINS query and retrieving srcID should be doing.

On your database, if you search for large resultsets (e.g. an* or
similar), does it also take that long until the returning of all
recordsets is completed?

My table is only 510,000 rows so it's not as large as yours.

Wouldn't it be faster to somehow directly query the
full-textservice?

When retrieving just the [KEY], that's what you're doing. I don't
know of a way to access the FTS service outside of SQL Server, if
there is a way at all.

I've already had an index containing ID, srcID and src - wich was
accessed with CONTAINS - so that shouldn't be a problem...

To compare the queries, put them all into Query Analyser and turn on
Show Exection Plan, and then run the whole lot - the Execution Plan
window will then give you a percentage query cost relative to the
whole batch, making it easier to spot which appears to be the best
solution. Basing it on times is a bad idea as caching can affect the
results.

If you can do so, try to build your queries to match on whole words,
or reduce the number of results you pull back. The more results you
have, the worse your performance will be as SQL Server will have to
look up additional data, and with a very large number of rows you end
up with a Merge Join/Inner Join which adds overhead on the Sort. For
example, if I run the following, the query cost for each is:

select ID from STK where CONTAINS(STK.QuickSearch, '"windows"')
Results: 1272, Cost: 14.8% (nested loop)

select ID from STK where CONTAINS(STK.QuickSearch, '"window*"')
Results: 1450, Cost: 14.9% (nested loop)

select ID from STK where CONTAINS(STK.QuickSearch, '"win*"') Results:
7269, Cost: 15.28% (merge join with sort)

select ID from STK where CONTAINS(STK.QuickSearch, '"wi*"') Results:
60325, Cost: 23.78% (merge join with sort)

select ID from STK where CONTAINS(STK.QuickSearch, '"w*"') Results:
176914, Cost: 31.16% (hash match)

As you can see, the more rows that are returned the more work is done
to pull information from the table.

Comparing on the FTS search itelf, it's obvious that returning a lot
of rows is a bad idea:

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"windows"') Cost:
3.09%

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"window*"') Cost:
3.12%

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"win*"') Cost:
4.15%

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"wi*"') Cost:
30.02%

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"w*"') Cost: 59.62%

Notice how the last two cost significantly more due to the number of
rows being returned. Retrieving data from the FTS engine is a
comparitively slow process. This is much more obvious when you use
TOP_N_BY_RANK to reduce the number of results.

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"w*"') Cost: 97.95%

select [KEY] from CONTAINSTABLE(STK,QuickSearch, '"w*"',1000) Cost:
2.05%

If you can find a way to work with the TOP_N_BY_RANK and
CONTAINSTABLE you should be able to optimise your queries
significantly. Luckily I rarely have queries that take longer than 2
or 3 seconds to run due to only having 510,000 rows, and in most
cases less than 2,000 results for any given search. I notice in your
original post that you say that you cannot do this due to the results
being "ordered totally wrong". There is nothing stopping you using an
ORDER BY on the returned results to change the sort order, and using
the ISABOUT, NEAR, and other keywords in FTS to adjust the way the
rank is calculated to try to ensure that you get just the results
that you need from the FTS search.

Dan


.