Re: Processing Word documents
- From: Thiago <Thiago@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 7 May 2008 16:20:01 -0700
Hi Howard,
I assume you are talking about word 2003 documents? If they were word 2007
(docx) files you would have more flexibility. If there's the possibility of
these files being word 2007 documents then have a look at the latest hotrod
magazine for examples of pipelines and how to get to the data in them:
http://biztalkhotrod.com/default.aspx
Otherwise your 2003 interop with saving to temp and reading back from it is
the way to go I think. One thing you might want to investigate is if the 2007
interop has any extra methods you could use to read the 2003 files.
Thiago Almeida
http://connectedthoughts.wordpress.com
"Howard Siegel" wrote:
Thanks Yossi..
I've been reading more about the Office API and I don't see
where it can take an already opened stream. It wants to
open the file itself. So short of having something external
to Biztalk do that to convert to XML which BizTalk then reads,
I'm probably going to have to do as you suggest... write the
stream to a file (in %TEMP%), use the Office API to convert it
to XML, and then open the XML file and return it's stream to
the next stage in the pipeline. Ugly, but should work.
The one question I have not gotten an answer to as yet is the
volume of documents I'll need to process.
- h
"Yossi Dahan [MVP]" <yossi.dahan@xxxxxxxxxxxxxxx> wrote in
news:0EB8E243-9AA3-47CA-9BB4-E943DE041B09@xxxxxxxxxxxxx:
I believe the general approach is to use the office API to open the
document, perform a 'save as' to save it in xml format and then open
that file again.
you have a few options here -
Probably the neatest option is to develop a custom adapter to do that,
but this is not the simplest of tasks.
Second option is to develop a custom pipeline component that would
take the stream, save it as a word docunmet, open that, save as xml,
open that and return the stream to the pipeline. not the most
efficient thing in the world, but might do the trick. (I don't know if
the office API has support for stream so that you don't have to store
the binary file first). Alternatively you can do the above in a
generic orchestration. probably the simplest implementation, but the
one with the highest overhead and not as neat from an architectural
perspective.
hth
Yossi Dahan
"Howard Siegel" <not.interested@xxxxxxxxxxxxxx> wrote in message
news:Xns9A95BFE78F49Bhelloimthedoctor@xxxxxxxxxxxxxxxxx
We are going to be receiving MS Word documents (.doc style) that
need to be processed by BizTalk. The Word documents contain
forms from which we need to extract the data which is then
deposited in to a database.
I was thinking that a receive pipeline component could process
the stream from a File adaptor, extract the data from the forms
and build an XML string to the schema we'll be using.
I don't know the first thing about processing Word files.
Has anyone done this? Or something that accomplishes the same
thing? Where do I even start?????
- h
- Follow-Ups:
- Re: Processing Word documents
- From: Howard Siegel
- Re: Processing Word documents
- References:
- Processing Word documents
- From: Howard Siegel
- Re: Processing Word documents
- From: Yossi Dahan [MVP]
- Re: Processing Word documents
- From: Howard Siegel
- Processing Word documents
- Prev by Date: RE: \r\n problem with SOAP and XmlDocument
- Next by Date: Re: Pipeline Component + IDE
- Previous by thread: Re: Processing Word documents
- Next by thread: Re: Processing Word documents
- Index(es):
Relevant Pages
|