Re: newbie: Full Text Search Against PDF Blobs



With Lucene you really have to roll your own solution, all it is, is a
full-text search engine. You have to write code to query it and to feed
documents to it to index these documents. Lucene is designed for the 5-10
million document range, but can be scaled much higher. It is optimized to
return results in batches to 10, 20, 25 or 100 results. If you return all
results its performance is much worse than SQL FTS.

Lucene allows you to so true property based searches.

SQL FTS is highly scalable but you really have to think about partitioning
after you hit 50 million rows.

You really have to test to see what works best in your environment.

--
relevantNoise - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Des" <support@xxxxxxxxxxxxxx> wrote in message
news:%235elAfg3HHA.2064@xxxxxxxxxxxxxxxxxxxxxxx
I have a client which this solution sounds perfect for:

Does anyone know a web site that I can test the "Full Text Search Against
PDF Blobs" functionality against.

The website "layout design guy" is saying that SQL 2005 will be too slow
and we should use "Lucene", an open source indexer instead.
Does anyone have any info I can use to show this guy that SQL Server 2005
will be faster?

The target site has several thousand Report PDFs at 2Mb average each
(about 10GB in total).

I have watched the video
"http://download.microsoft.com/download/b/3/8/b3847275-2bea-440a-8e2e-305b009bb261/sql_13.wmv";
that was referenced in this group recently.


Thanks,
Des


.



Relevant Pages

  • Re: newbie: Full Text Search Against PDF Blobs
    ... There was a shortcoming that there was no 64 bit iFilter that has now been resolved http://sqlblogcasts.com/blogs/simons/archive/2007/07/18/PDF-64-bit-iFilter-at-last.aspx ... a full-text search engine. ... If you return all results its performance is much worse than SQL FTS. ... Lucene allows you to so true property based searches. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: newbie: Full Text Search Against PDF Blobs
    ... I would much prefer to stay inside the environments I am familiar with, ... Do you know of a site that uses SQL FTS to Search PDF's Stored as Blobs? ... full-text search engine. ... Lucene is designed for the 5-10 million document range, but can be scaled much higher. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: SQL Server Full-Text Search Performance
    ... SQL 2005 is a great choice. ... SQL FTS has faster indexing and is searches are very ... Lucene is optimized for returning small number of rows (as is SQL ... Looking for a FAQ on Indexing Services/SQL FTS ...
    (microsoft.public.sqlserver.fulltext)
  • Re: SQL Server Full-Text Search Performance
    ... Can we create a FT Catalog for each partition in a partitioned table in SQL ... Lucene is optimized for returning small number of rows (as is SQL ...
    (microsoft.public.sqlserver.fulltext)
  • Re: newbie: Full Text Search Against PDF Blobs
    ... I would much prefer to stay inside the environments I am familiar with, ... Do you know of a site that uses SQL FTS to Search PDF's Stored as Blobs? ... full-text search engine. ... Lucene allows you to so true property based searches. ...
    (microsoft.public.sqlserver.fulltext)

Loading