Catalog Population VS Database Backup

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: RGondzur (rgondzur_at_NO_SPAM_aicsoft.com)
Date: 03/19/04


Date: Fri, 19 Mar 2004 16:01:09 -0500

MS Windows 2000 Adv Server
SQL Server 2000

The FTS catalog is stored on the same box as SQL Server.

There are two applications that access a SQL database.

I have one application that stores data from an OCR process in a SQL table.
The table includes a timestamp column.
FTS indexes one text column on this table.
The table currently holds 500,000 rows.

I have FTS programmed for incremental population.

I currently perform OCR on images in batches of 500 items at a time.
The OCR data is inserted into the SQL table as it is processed.
At the completion of the batch, I then start the catalog population to index
the text added to the table.
Each population is taking approximately 20 minutes.
During the population process, CPU utilization on the SQL box hits 80+
percent.
Users do not access this application directly and it runs on its own box on
the network.

The second application, the one being accessed by users, searches against
the FT Search catalog looking for text contained in the images that were
OCRed
.
Since the first app can initiate a catalog population event at the end of
OCR batch process, it is causing poor performance for
all users of the second app while it is indexing. Users are complaining of
slow response.

I am considering moving the catalog population to one scheduled job that
runs nightly and indexes all of the images that were processed during the
day. This will likely cause a population to take 6-8 hours per night.

In addition, the nightly database backup takes 5-6 hours per night.

Q1.
Is there a way to minimize the impact of the catalog population on the
database using the existing method that starts the indexing process after
each batch?
I like the idea of indexing small batches as they are processed and keeping
the indexing time to approximately 20 minutes
per batch. The drawback is the poor response for all users during this time.

Q2.
If I go to one scheduled population job nightly, what affect does the
catalog population job have on the database backup job when the two overlap?
I'm afraid that both jobs running concurrently will greatly increase their
completion time and they won't finish before the start of the new day.



Relevant Pages

  • Re: Benchmark for Full Text Search
    ... I am interested in how you optimized the fts for a multi-terabyte ... Is the underlying data partitioned with SQL ... full-text catalog are OK before you start having performance issues (my guess ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Performance problems -- need guidance on scaling
    ... by caching queries I mean caching the search results so when a search on ... the search would be directed to the catalog. ... Looking for a SQL Server replication book? ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Performance problems -- need guidance on scaling
    ... So we're now CPU ... I have read that putting the catalog and SQL table on separate RAID ... Our current server will support only a single CPU. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Catalog Population VS Database Backup
    ... Since you're using SQL Server 2000, ... Is there a way to minimize the impact of the catalog population on the ... Yes, use "Change Tracking" with "Update Index in Background", this was ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Locks & FT Catalogs
    ... Director of Text Mining and Database Strategy ... Looking for a SQL Server replication book? ... in the 6 hours after that either) leaving that catalog completely unusable ... synchronise their changes from staging to live (or vice versa to roll ...
    (microsoft.public.sqlserver.fulltext)