Re: Find & Replace in Multi-Megabyte Strings





Rex Avery... REGEX Avery should be your name!


Have a look at regex. (regular expressions)

Following test builds a 28million character teststring, it will still
loop thru all the replacements, but regex replace is a "trifle" faster
than vba's.

Following example takes 20 seconds for the 6 replacements.
(but note that each replacement occurs 1 million times!

The example uses late bound code, it's a little bit faster when you
reference "Microsoft VBScript Regular Expression 5.5" and dim the rgx
variable as RegExp.

You'll have to learn some regex patterns though!
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script5
6/html/vspropattern.asp

or try and buy RegExBuddy. (be aware that regexbuddy supports more
advanced flavors than vbscripts regex)

Sub TestRegExReplace()

Dim str$, arrPat, arrRep

Dim rgx As Object
Dim t!, n&
'This is slow but nevermind..
'it builds a string of 28 000 000 chars.

str = "This is the string I have." & vbLf
For n = 1 To 20
str = str & str
Next
Debug.Print Len(str)

arrPat = Array("This", "is", "the", "string", "I", "have")
arrRep = Array("That", "was", "that", "text", "you", "had")

t = Timer
Set rgx = CreateObject("vbscript.regexp")
With rgx
.Global = True
.IgnoreCase = True
For n = LBound(arrPat) To UBound(arrPat)
.Pattern = "\b" & arrPat(n) & "\b"
str = .Replace(str, arrRep(n))
Next
End With

t = Timer - t
MsgBox Left(str, InStr(50, str, vbLf)) & "in " & t & "seconds"

End Sub


HTH... (but i'm fairly sure it does)




--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


R Avery wrote :

> This is a re-post of "Batch Replace function for Large strings"
>
> String processing in VBA is very slow when strings are large (1-500
> MB). I have a
> function that I've been using for doing batch replace ops (in the
> above referenced previous post), but
> it chokes on large strings with many replacements to do (like 50),
> because it has to do 50 passes of the string to perform the
> replacements.
>
> Has anyone written a fast function designed to do the same thing for
> large strings but only makes one pass through the data?
.



Relevant Pages

  • Re: Java vs. Pascal
    ... Strings zu tun. ... 500 pattern.split~Regex ... Mit wiederverwendetem pattern hat man da schon 30% der Zeit gespart. ... Das Potential der verschiedenen Methoden (RegEx, StringTokenizer, split2) ...
    (de.comp.lang.java)
  • Regex doesnt match - what am I doing wrong?
    ... I am having trouble matching a regex that combines a negated character ... This matched all strings regardless of whether or not they ended in a ...
    (comp.lang.perl.misc)
  • Regex doesnt match - what am I doing wrong?
    ... I am having trouble matching a regex that combines a negated character ... This matched all strings regardless of whether or not they ended in a ...
    (comp.lang.perl)
  • Re: for a laught (???)
    ... Moreover, whenever possible, OC uses POSIX C functions ... Snip from POSIX regex - ... Regex doesn't work too well with a null byte delimiter :-) ... Regex doesn't work with null terminated strings. ...
    (comp.lang.cobol)
  • Re: did I get greedy quantifiers wrong ?
    ... My initial understanding was that .*i would match all the way till last char i. ... > regex to match. ... > $str = mississippi; ...
    (perl.beginners)