Re: Speed of using OLE for automation

Tech-Archive recommends: Fix windows errors by optimizing your registry



Hi Christian,

I need some information in speed of OLE with MFC. I have created some
functions to parse a MS-Word document. This functions extract all the
text in the document body, including tables, headers and footers.
Additionally, style, font size, page and line number is extracted.
What I am doing is to get all the paragraphs of the document and handle
every one of them (getting the text, getting style and font
information...). I have to do it this way because I need to find some
specific parts of the text characterized by style, font or parts of the
text.
The challenge is that I need to parse large documents with my
functions. The documents are about 1.5MB/80 pages. At the moment, it
takes around 20-30 minutes to check these documents. This is definitely
to much. Is there any hint how the speed of OLE may be increased.
I used a profiler to see where the time is lost.
Most of the time is used for calls to
COleDispatchDriver::InvokeHelper
COleDispatchDriver::CreateDispatch
COleDispatchDriver::~COleDispatchDriver

These three functions take around 90% of the overall time.

Is there a way to for speed improvement?

Or do you have any experience how long it should take to parse such
documents.

If you automate Word, it's going to be slow. If you have to "walk" the
document, it's going to be slow.

You might get an increase in speed if you pack the automation code that
works closely with the Word object model into VBA procedures. Most likely
in a template that your app loads as an Addin object, then unloads when
it's finished. "Native" code runs significantly faster when dealing with
an Office application because it doesn't have to cross the OLE
boundaries.

If you didn't need the page and line numbers, I'd say save the documents
as RTF or (Word 2003 and later) XML, then extract the information from
them without opening Word. But if you need line and page numbers, you
have no choice because Word has to lay out the document dynamically for
this information.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :-)

.



Relevant Pages

  • Re: Windows Update Issue Partly Solved
    ... Follow the instructions below to extract the .cab file: ... > Microsoft.com.home and SiteMap links at the top right hand corner are wrong ... > Statement etc) are also wrong font. ... It never gets to the Express or Custom install ...
    (microsoft.public.windowsupdate)
  • Re: Non-style-based formatting within a paragraph
    ... They want to extract contact data from hundreds of word documents and all ... of font, bold, italics, change of colour etc. ... Normal style. ... the paragraph style and then us Find to search for things like ...
    (microsoft.public.word.vba.general)
  • Moving Ascii text from Windows to RISCOS
    ... I frequently extract Ascii text from windows - but when I do this there are ... RISCOS to match the font? ... The drawfile is then loaded into Ovation Pro. ...
    (comp.sys.acorn.apps)
  • Re: Best way to parse/walk-thru WORD document using VBA?
    ... Define parse and extract please. ... reads "Shrink's Name" has the Shrink's name in cell 2. ... Jeff J Jones reckoned: ...
    (microsoft.public.word.vba.general)
  • Re: Automating search for words in a website using WSH
    ... I want to give you the actual URLs, but now I also can't recall the addresses. ... ideas as to extract that data. ... going to have to parse the HTML. ... HTML scraping is very specific to the ...
    (microsoft.public.scripting.wsh)