Re: Architecture question



answers in line.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



"Jeremy" <grand@xxxxxxxxxxx> wrote in message
news:%23iOV2SlnGHA.4728@xxxxxxxxxxxxxxxxxxxxxxx
Hilary, thanks. Yes, I have read the blog article, but the djvu format is
new to me & I'll study that.

Part of my question had to do with the overall architecture of the process
( I know it's not a technical fts question, but what better forum to ask
it
on?). Boiled down, I'm thinking of a process along these lines:

- The fax machine would store incoming images in a windows folder. Should
it store each transmission in a single file (might be a batch of dox, not
just a multi-page doc), or each page?

If its multiple pages per fax and you search on a word and get a word from
one page of a fax, how are you going to display all pages for that fax.
Unless you can figure out a way to do this, it should be a single page.

- An unattended program would monitor the output of the ocr process & suck
the images with (embedded or separate?) text into a sql table, where
incremental indexing is going on. Up to this point, the incoming stuff
has
not been touched or viewed by a human being.

You would probably want the OCR'd data in a text column for faster indexing
speed. Store the binary data in the file system. For administrative purposes
you could store it in the database.
- A program looking for variants on customer name will classify the
incoming
images by customer & index the dox. A user would handle exceptions, and
hopefully the bulk of the images would be correctly indexed without
intervention.

Use fuzzy grouping or the thresausus option to handle spelling variations.

- Customer specialists will work a list of newly arrived and ocr'd images
for their assigned customer, keying into our system from the images. FTS
can
attempt to extract the target data & prefill various fields. The users
will
verify & correct.

Any thoughts are appreciated.

Jeremy


"Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
news:uFOwoWEnGHA.2264@xxxxxxxxxxxxxxxxxxxxxxx
You might want to look at this technology http://www.djvuzone.org/

Office does automatic OCR on Tiffs, which you can receive on your
computer.
It works with SQL FTS.

You could also push the pdf's into SQL Server and have them indexed there
as
well. The problem is the quality of your OCR.

Here is an article on how to do it.

http://www.indexserverfaq.com/blobs.htm





.



Relevant Pages

  • Re: Simple DBMS in VB
    ... Because you want to store variable length data, ... but complicates updating when changes in the data necessitate ... seem the Genre table and the Album's song lists would not need updating ... is to keep the images in their ...
    (microsoft.public.vb.general.discussion)
  • Re: Simple DBMS in VB
    ... Because you want to store variable length data, ... but complicates updating when changes in the data necessitate ... seem the Genre table and the Album's song lists would not need updating ... is to keep the images in their ...
    (microsoft.public.vb.general.discussion)
  • Re: storing path only to database
    ... Do not store the images in the file, rather store them on the hard drive and ... store the path to them in the Access database. ... Alternatively, you can store images in multiple tables, each one of which is ... Dim MyFolder As String ...
    (microsoft.public.access.formscoding)
  • Re: Simple printing and graphics
    ... images; you store the data that defines what the images are to be drawn. ... not sure you're talking about images or the Graphics class. ... wrap the .Net printing method into a simpler class. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: OK, stupid question about picture managment programs...
    ... My database strategy is detailed at: http://www.clarkvision.com/photoinfo/digitalworkflow ... A search of the database on a 1.8 GHz PC takes less than a second. ... For example my command "findimage family" returns ... whereas "findimage eagle" finds 1558 images. ...
    (rec.photo.digital)