Re: Benchmark for Full Text Search
- From: "Hilary Cotter" <hilary.cotter@xxxxxxxxx>
- Date: Fri, 27 Jun 2008 08:27:48 -0400
We ended up breaking the database into smaller databases of about 50 million rows. Eventually the databases became monthly (again holding about 50 million rows).
Searches were directed against each database - each db had a single catalog and a single sql fts index.
Our queries did not have a real time component to them so we were more tolerant of the search latencies. The long and short of it is we eventually moved to Lucene:)
Contact me offline if you want more info.
Hilary
"sql411@xxxxxxxxxx" <sql411nospamcom@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:576B160F-BCC3-4431-B414-9BF6800FAEAD@xxxxxxxxxxxxxxxx
Hillary, I am interested in how you optimized the fts for a multi-terabyte
catalog. I am in the same situation, a 7TB table with 350+ million rows.
The FT index for this table is over 1.5 TB. Some searches with very unique
words (hence, small # of rows coming back) finish fast – but very common
words can take as long as 40 minutes before we decide to time things out.
What kind of optimizations have you done outside the norm of creating a
catalog/fulltext index of the table? With billions of rows – your key must
be a bigint – did that affect performance on building/searching? Did you add
a unique value to help further filter the data (equivalent to a batch ID or
something that minimizes hits)? Is the underlying data partitioned with SQL
2005, or old school partitioning with an updateable view?
I am sure SQL 2008 will help quite a bit being able to utilize the regular
table filters to the full-text query, but we still have ~3 months before it
releases.
I’ve considered breaking out the table to multiple ones across multiple
servers, and creating multiple full-text catalogs to go against. Anyone who
has done this before on a large scale? I am considering breaking the large
table up to possibly several thousand tables – say 5,000. If I did this and
created a full-text index on each table – what is the optimal relationship to
catalog/full-text index? For example, how many full-text indexes per
full-text catalog are OK before you start having performance issues (my guess
is it probably has more to do with record size/full-text index size than
actual full-text index counts). Has anyone created several thousand
full-text catalogs or full-text indexes on one server in production and it
went well?
Anyone who has helped SQL FT search scale well on many terabytes – I’d
greatly appreciate your hard earned lessons.
Many thanks,
Robert Towne
"Hilary Cotter" wrote:
There are no published benchmarks as they would vary on your content (text,
image, char), what is being indexed (a lot of the same words will perform
differently than a lot of unique words), your machines and table design.
However I was working on a system where we sql fts multi-terabytes and
performance was subsecond for most queryies (over a billion rows IIRC),
however queries which were complex or for which there were a large number of
rows could be very lengthy.
"Ankur Rawat" <AnkurRawat@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:33271B68-1CE5-4DBE-8797-427008625EA8@xxxxxxxxxxxxxxxx
> Hi,
> We are currently using a 3rd party Search Engine to search our > application
> content in SQL server.
>
> Could someone please point me to Benchmark data for the FTS service? I > am
> interested in switching to FTS even if the Performance numbers are
> comparable
> or better.
>
> thanks for any help!
>
> regards,
> Ankur.
>
.
- References:
- Re: Benchmark for Full Text Search
- From: sql411@nospam.com
- Re: Benchmark for Full Text Search
- Prev by Date: Re: Benchmark for Full Text Search
- Next by Date: ft index stats: token frequency, etc ?
- Previous by thread: Re: Benchmark for Full Text Search
- Next by thread: ft index stats: token frequency, etc ?
- Index(es):
Relevant Pages
|