Re: Architecture question
- From: "Jeremy" <grand@xxxxxxxxxxx>
- Date: Sun, 2 Jul 2006 21:00:35 -0700
Hilary, thanks. Yes, I have read the blog article, but the djvu format is
new to me & I'll study that.
Part of my question had to do with the overall architecture of the process
( I know it's not a technical fts question, but what better forum to ask it
on?). Boiled down, I'm thinking of a process along these lines:
- The fax machine would store incoming images in a windows folder. Should
it store each transmission in a single file (might be a batch of dox, not
just a multi-page doc), or each page?
- An unattended program would monitor the output of the ocr process & suck
the images with (embedded or separate?) text into a sql table, where
incremental indexing is going on. Up to this point, the incoming stuff has
not been touched or viewed by a human being.
- A program looking for variants on customer name will classify the incoming
images by customer & index the dox. A user would handle exceptions, and
hopefully the bulk of the images would be correctly indexed without
intervention.
- Customer specialists will work a list of newly arrived and ocr'd images
for their assigned customer, keying into our system from the images. FTS can
attempt to extract the target data & prefill various fields. The users will
verify & correct.
Any thoughts are appreciated.
Jeremy
"Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
news:uFOwoWEnGHA.2264@xxxxxxxxxxxxxxxxxxxxxxx
You might want to look at this technology http://www.djvuzone.org/computer.
Office does automatic OCR on Tiffs, which you can receive on your
It works with SQL FTS.as
You could also push the pdf's into SQL Server and have them indexed there
well. The problem is the quality of your OCR.
Here is an article on how to do it.
http://www.indexserverfaq.com/blobs.htm
.
- Follow-Ups:
- Re: Architecture question
- From: Hilary Cotter
- Re: Architecture question
- Next by Date: Re: Newbie question:why use stack in Full-Text index?
- Next by thread: Re: Architecture question
- Index(es):
Relevant Pages
|
Loading