Architecture question
- From: "JeremyGrand" <jeremy@xxxxxxxxxxxxxx>
- Date: Thu, 29 Jun 2006 11:09:04 -0700
We receive hundreds of faxes per week. The fax machine dumps pdfs into a
folder on our server (other formats are possible). The goal is to identify
the customer and project the fax applies to, extract a small amount of
specific data from the document and add a record to our database, and store
the image so it can be retrieved while browsing the customer's records.
We have ocr software (abbyy fine reader) that can monitor a folder for new
images, ocr them, and store the output in any of several file types (but not
apparently sql server). So, I'm thinking we'll build something that
monitors the ocr output folder and suck the pdfs into a sql table where fts
will be indexing incrementally.
The customer's name and project number, possibly misspelled, will be in the
ocr'd text.
The people now doing data entry from paper documents first sort them by
customer and distribute them to specialists, who work a single customer at a
time. We're thinking to organize the image handling the same way. We'll
classify the incoming by customer, maybe with help from a program looking
for variants on customer name. Then customer specialists will work a list
of newly arrived and ocr'd images for their assigned customer. FTS can
attempt to extract the target data & prefill various fields. The users will
verify & correct.
Anyone have comments on this? I'd bet this is a problem lots of folks have
solved & I'd love to hear how you did it.
Jeremy
.
- Follow-Ups:
- Re: Architecture question
- From: Hilary Cotter
- Re: Architecture question
- Prev by Date: Introduction to Search
- Next by Date: Re: Architecture question
- Previous by thread: Introduction to Search
- Next by thread: Re: Architecture question
- Index(es):
Relevant Pages
|