Re: Very Slow character counting in Word 2003

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Klaus Linke (info_at_fotosatz-kaufmann.de.no.junk)
Date: 05/21/04


Date: Fri, 21 May 2004 04:22:05 +0200

Hi Edgar,

  For CharCounter = 1 To CurrentRange.Characters.Count
  ' Do something with CurrentRange.Characters(CharCounter)
is a terribly slow way to loop characters.
For each character, Word has to count through all the characters from the
first up to CharCounter to locate that character.
In large documents, this will take a time that is proportional to the
square of the number of characters.

Somewhat faster:
  For Each charLoop in CurrentRange.Characters
  ' Do something with charLoop
This will take a time that is proportional to the number of characters.
(After one character has been processed, Word will simply go to the next
one)
But if you need to access the properties for each character (font,
underline ...) it still takes a long time.
Word keeps track of which ranges are, say, underlined. It takes some time
to figure out whether some particular character is in some underlined
range.

Much faster -- Avoid accessing individual characters using "Find/Replace":
Take a count of characters.
Then use "Find/Replace", to delete, say, underlined characters, and get the
count of characters again.
Take the difference. Then undo the deletion.
Word already "knows" the ranges that are underlined (for example).
So "Find/Replace" doesn't need to look at individual characters, nor figure
out how some specific range/character is formatted.

See http://word.mvps.org/faqs/macrosvba/GetNoOfReplacements.htm for some
example code.

Regards,
Klaus

"Edgar E. Cayce" <edgarecayce@yahoo.com> wrote:
> I am writing a character counting application which needs to go
> through hundreds of Word documents, counting the characters in
> each one. I need to do things like count double for bold/italic/
> underline and count font transitions, etc., so just asking Word for
> a char count does not do it for me.
>
> When the bit of code that does the actual counting executes, it is
> awfully slow - like 10-20 characters per second.
>
> The app that calls this is VBA from Access 2003.
>
> What I do is get a range for each "story" in the document (body,
> headers, footers, etc.) and run the following code on the range
> (note that Counter is a structure I use to hold the various counts).
> I find that it takes several seconds between each debug printing
> that it has counted 100 chars:
>
> Dim BoldState As Boolean
> Dim UnderlineState As Boolean
> Dim ItalicState As Boolean
> Dim LastFontName As String
> Dim LastFontSize As Long
> Dim LastFontColor As Long
> Dim CharCounter As Long
> Dim TheChar As Object
>
> BoldState = False
> UnderlineState = False
> ItalicState = False
> LastFontName = CurrentRange.Characters(1).Font.Name
> LastFontSize = CurrentRange.Characters(1).Font.Size
> LastFontColor = CurrentRange.Characters(1).Font.Color
>
> For CharCounter = 1 To CurrentRange.Characters.Count
> If CharCounter Mod 100 = 0 Then Debug.Print CharCounter
> With CurrentRange.Characters(CharCounter)
> If .Text = " " Then
> Counter.Spaces = Counter.Spaces + 1
> ElseIf .Text = vbTab Then
> Counter.Tabs = Counter.Tabs + 1
> ElseIf .Text = vbCr Then
> Counter.Returns = Counter.Returns + 1
> Else ' else it is a char, the only one where we care about
> bold, font, etc.
> Counter.Chars = Counter.Chars + 1
> If .Case = wdUpperCase Then
> Counter.CapitalChars = Counter.CapitalChars + 1
> End If
>
> ' note for these, we check transition as well as
> presence.
> If .Font.Bold Then
> Counter.BoldChars = Counter.BoldChars + 1
> If BoldState = False Then
> Counter.BoldTransitions =
> Counter.BoldTransitions + 1
> End If
> BoldState = True
> Else
> If BoldState = True Then
> Counter.BoldTransitions =
> Counter.BoldTransitions + 1
> End If
> BoldState = False
> End If
>
> If .Font.Underline Then
> Counter.UnderlineChars = Counter.UnderlineChars +
> 1
> If UnderlineState = False Then
> Counter.UnderlineTransitions =
> Counter.UnderlineTransitions + 1
> End If
> UnderlineState = True
> Else
> If UnderlineState = True Then
> Counter.UnderlineTransitions =
> Counter.UnderlineTransitions + 1
> End If
> UnderlineState = False
> End If
>
> If .Font.Italic Then
> Counter.ItalicChars = Counter.ItalicChars + 1
> If ItalicState = False Then
> Counter.ItalicTransitions =
> Counter.ItalicTransitions + 1
> End If
> ItalicState = True
> Else
> If ItalicState = True Then
> Counter.ItalicTransitions =
> Counter.ItalicTransitions + 1
> End If
> ItalicState = False
> End If
>
> If .Font.Name <> LastFontName Then
> Counter.FontTransitions = Counter.FontTransitions
> + 1
> LastFontName = .Font.Name
> End If
> If .Font.Size <> LastFontSize Then
> Counter.FontTransitions = Counter.FontTransitions
> + 1
> LastFontSize = .Font.Size
> End If
> If .Font.Color <> LastFontColor Then
> Counter.FontTransitions = Counter.FontTransitions
> + 1
> LastFontColor = .Font.Color
> End If
>
> End If
>
> End With
> Next CharCounter
>
>
> Can anyone help me figure out how to speed this up? I saw something
> posted about how using Range.Characters(CharCounter) to index the
> chars would be slow, so I tried a For Each [CharObj] in
> Range.Characters instead, but it did not seem to be any faster - and
> am I getting them in order when I do that?
>
> Ed



Relevant Pages