Re: Suggestions for extracting masked selections and generating XML

Tech-Archive recommends: Fix windows errors by optimizing your registry



Hi,

Word is not designed to work that way.

Your best approach would be to persuade the people sending the documents to you to either use formfields to capture the data in each document. Then it would be quite simple to extract the data, without even needing to know much about the layout.

Alternatively, you could take the documents to a document scanning company with the software to create masks to recognise the data fields (and, if there's a considerable degree of consistency in the various layouts, the document formats that go with each mask). This is much more expensive than the first approach, but it can be quite fast.

--
Cheers
macropod
[Microsoft MVP - Word]


"AutomationHelpinSF" <AutomationHelpinSF@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:9387C733-F1F3-4F83-9B74-8ECE2576759F@xxxxxxxxxxxxxxxx

I have several hundred e-mail messages a day coming in, each in one of
roughly 17-20 different Word "format" files. My goal is to extract the same
data from each of these files so I can put them into a database for further
business processes.

For example, from each file I want to extract a Name, a numerical Score, and
a character Code. In each of the format files, the data is in a different
place in the document. Some have them in a sentence structure in the first
paragraph, some have a list format, etc. As it stands now a human has to
read it an extract the data.

What I would like is a way for a user to take a sample file that's in the
first format, make a selection and somehow "tag" it as if to say "this area
I've selected should be the Score". Do that for all the data items in the
file. Then, have a tool / macros which can then process any file in that
format, extract those selected areas, and generate some other file with
name/value pairs (XML, CSV, etc.)

I could instead write code which could extract the data, however if we get a
new file format that means writing new code. We'd like to delegate that to a
data entry person who can create a "mask" for the new format (they change
frequently as we're processing data from many vendors and there are no
standards in this field).

I am not a hardcore Outlook / Word / Office automation person and so I'm
looking for suggestions as to approach, tools which could be purchased which
would accomplish this, suggestions, etc.

If people have suggestions about more appropriate places to ask, I'd
appreciate that as well.

I appreciate the help, Steve.

.



Relevant Pages

  • Re: tt:mm:ss interferes with defined range?
    ... In Column data format select Do not import ... This will extract the unique dates in ascending order. ... You can use this data for you chart. ... Microsoft Excel MVP ...
    (microsoft.public.excel)
  • Re: Exporting DV using Toast
    ... presents, so I extracted the footage from them in DV format using Toast, ... edited them using Final Cut Pro, and mastered and burnt them using iDVD. ... went well until playing the final disc image with DVD player, ... I've used Toast because it seems to be the only thing that will extract DV ...
    (uk.comp.sys.mac)
  • Suggestions for extracting masked selections and generating XML
    ... My goal is to extract the same ... In each of the format files, the data is in a different ... make a selection and somehow "tag" it as if to say "this area ... format, extract those selected areas, and generate some other file with ...
    (microsoft.public.office.misc)
  • Re: What is the preferred way to parse messages out of a log?
    ... I need to extract the values of the message and ... format, provide searching and message grouping by the dialogID. ... Trunk Group ID: 00000001 ... If you are comfortable with Java tools for XML you might look into ...
    (comp.lang.java.help)
  • Re: JSON Format and VFP 8 or 9
    ... A supplier for my company has changed the format on a file that we regularly download and extract into our VFP 8 tables to the JSON format. ... Dunno how much data is sent that way, but if speed is not of the utmost essence, I'ld investigate if you can use an implementation which walks the object tree build from the json data. ...
    (microsoft.public.fox.programmer.exchange)