Re: Processing Word documents



Hi Howard,

I assume you are talking about word 2003 documents? If they were word 2007
(docx) files you would have more flexibility. If there's the possibility of
these files being word 2007 documents then have a look at the latest hotrod
magazine for examples of pipelines and how to get to the data in them:
http://biztalkhotrod.com/default.aspx

Otherwise your 2003 interop with saving to temp and reading back from it is
the way to go I think. One thing you might want to investigate is if the 2007
interop has any extra methods you could use to read the 2003 files.


Thiago Almeida
http://connectedthoughts.wordpress.com

"Howard Siegel" wrote:

Thanks Yossi.

I've been reading more about the Office API and I don't see
where it can take an already opened stream. It wants to
open the file itself. So short of having something external
to Biztalk do that to convert to XML which BizTalk then reads,
I'm probably going to have to do as you suggest... write the
stream to a file (in %TEMP%), use the Office API to convert it
to XML, and then open the XML file and return it's stream to
the next stage in the pipeline. Ugly, but should work.

The one question I have not gotten an answer to as yet is the
volume of documents I'll need to process.

- h

"Yossi Dahan [MVP]" <yossi.dahan@xxxxxxxxxxxxxxx> wrote in
news:0EB8E243-9AA3-47CA-9BB4-E943DE041B09@xxxxxxxxxxxxx:

I believe the general approach is to use the office API to open the
document, perform a 'save as' to save it in xml format and then open
that file again.
you have a few options here -
Probably the neatest option is to develop a custom adapter to do that,
but this is not the simplest of tasks.
Second option is to develop a custom pipeline component that would
take the stream, save it as a word docunmet, open that, save as xml,
open that and return the stream to the pipeline. not the most
efficient thing in the world, but might do the trick. (I don't know if
the office API has support for stream so that you don't have to store
the binary file first). Alternatively you can do the above in a
generic orchestration. probably the simplest implementation, but the
one with the highest overhead and not as neat from an architectural
perspective.

hth

Yossi Dahan


"Howard Siegel" <not.interested@xxxxxxxxxxxxxx> wrote in message
news:Xns9A95BFE78F49Bhelloimthedoctor@xxxxxxxxxxxxxxxxx
We are going to be receiving MS Word documents (.doc style) that
need to be processed by BizTalk. The Word documents contain
forms from which we need to extract the data which is then
deposited in to a database.

I was thinking that a receive pipeline component could process
the stream from a File adaptor, extract the data from the forms
and build an XML string to the schema we'll be using.

I don't know the first thing about processing Word files.

Has anyone done this? Or something that accomplishes the same
thing? Where do I even start?????

- h




.



Relevant Pages

  • Re: BizTalk 2002 - XML Declaration - Unicode
    ... pipeline with the charset to ASCII and disabling the XML declaration ... The only thing we tried on the ASP side was to properly detect the ... BizTalk 2002 out of the box. ... >sent in UTF-8 encoding with byte order mark and the XML declaration set to ...
    (microsoft.public.biztalk.general)
  • RE: Remove namespace from in-/outgoing instance
    ... How do you add the remove/add namespace to a custom pipeline? ... You would write a custom pipeline components that alters the incoming message (presented as a stream by BizTalk). ...
    (microsoft.public.biztalk.general)
  • RE: Remove namespace from in-/outgoing instance
    ... If you want I can send my pipeline code to you that worked by adding a namespace to the incoming doc. ... > You would write a custom pipeline components that alters the incoming message (presented as a stream by BizTalk). ...
    (microsoft.public.biztalk.general)
  • Re: Need help with Flat File assembler
    ... Which version of BizTalk are you using? ... I need to convert an incoming XML file into a flat file. ... I have defined the map. ... also has a custom pipeline component comprising a flat file assembler. ...
    (microsoft.public.biztalk.general)
  • RE: xml on the wire-- compression
    ... automatically compressed, the answer is no, BizTalk doesn't automatically ... pipeline stage. ... Can anyone tell me if xml is compressed "automatically" when using the sql ... server adapter and also in using a web service? ...
    (microsoft.public.biztalk.general)