Re: Architecture question



Hilary, thanks. Yes, I have read the blog article, but the djvu format is
new to me & I'll study that.

Part of my question had to do with the overall architecture of the process
( I know it's not a technical fts question, but what better forum to ask it
on?). Boiled down, I'm thinking of a process along these lines:

- The fax machine would store incoming images in a windows folder. Should
it store each transmission in a single file (might be a batch of dox, not
just a multi-page doc), or each page?
- An unattended program would monitor the output of the ocr process & suck
the images with (embedded or separate?) text into a sql table, where
incremental indexing is going on. Up to this point, the incoming stuff has
not been touched or viewed by a human being.
- A program looking for variants on customer name will classify the incoming
images by customer & index the dox. A user would handle exceptions, and
hopefully the bulk of the images would be correctly indexed without
intervention.
- Customer specialists will work a list of newly arrived and ocr'd images
for their assigned customer, keying into our system from the images. FTS can
attempt to extract the target data & prefill various fields. The users will
verify & correct.

Any thoughts are appreciated.

Jeremy


"Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
news:uFOwoWEnGHA.2264@xxxxxxxxxxxxxxxxxxxxxxx
You might want to look at this technology http://www.djvuzone.org/

Office does automatic OCR on Tiffs, which you can receive on your
computer.
It works with SQL FTS.

You could also push the pdf's into SQL Server and have them indexed there
as
well. The problem is the quality of your OCR.

Here is an article on how to do it.

http://www.indexserverfaq.com/blobs.htm



.



Relevant Pages

  • Re: 280R and A5200 question
    ... They, the images, can then be ... >> the customer will grab them back. ... Sun should, too, despite ... I'm hearing this sun volume manger or disk manger thing mentioned, ...
    (comp.unix.solaris)
  • Re: 280R and A5200 question
    ... >> You don't say what the purpose of this storage is, ... They, the images, can then be downloaded ... > moving the information over to a spread sheet locally then uploading the ... > completed spreadsheets back to the portal where the customer will grab ...
    (comp.unix.solaris)
  • Re: OT - Matuas new job
    ... 'Naked' scanner in airport trial ... passengers' genitals. ... The airport has stressed that the images are not ... The customer hunted him down and knocked on his ...
    (rec.sport.rugby.union)
  • Re: Restrict Savings into HDD
    ... > I run cyber cafe in india, ... > since its cyber cafe so different types of customer visit ... > XXX images and XXX clipings into my pc. ... > and as per my knowledge we can create guest accounts. ...
    (microsoft.public.windowsxp.security_admin)
  • Re: Improving Internet response time
    ... > Another issue is that some pornography sites don't want to reveal the ... >same images on two different sites with different watermarks. ... Porn sites that serve movies would ... have a high bandwidth per customer, and I would imagine they would ...
    (comp.lang.java.advocacy)

Loading