Re: Fastest way to search a string for the occurance of a word??

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance

From: Niki Estner (niki.estner_at_cube.net)
Date: 08/11/04


Date: Wed, 11 Aug 2004 13:31:20 +0200


"Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in
news:MPG.1b840d145a87447b98b121@msnews.microsoft.com...
> Niki Estner <niki.estner@cube.net> wrote:
> > > I usually find the exact opposite. Regular expressions are extremely
> > > powerful, but for simple string testing, I usually find a hard-coded
> > > test to be about an order of magnitude faster than using regular
> > > expressions.
> >
> > Really?
> >
> > I've build a little test sample, to find out what cases you mean, maybe
> > there's a mistake in it?
>
> Well, most of the kinds of tests I was talking about are doing more
> than just IndexOf - things like checking for a number being (probably)
> parseable, etc.

Maybe, but the OP's question was what's the "Fastest way to search a string
for the occurance of a word?".

> The test case you gave showed roughly the same results for both methods
> - sometimes faster for IndexOf, sometimes faster for Regex.

Really? in all the tests I did here, the Regex was by far superior.

Examples on my PC:
stringLength = 100; repeatCount = 100000;
Testing String.IndexOf: 00:00:00.3750000
Testing Regex.Match: 00:00:00.0625000

stringLength = 1000; repeatCount = 100000;
Testing String.IndexOf: 00:00:03.1406250
Testing Regex.Match: 00:00:00.5625000

stringLength = 10000; repeatCount = 100000;
Testing String.IndexOf: 00:00:02.3437500
Testing Regex.Match: 00:00:00.5312500

The Regex is about 5 times faster.

If the regex is compiled:
stringLength = 100; repeatCount = 100000;
Testing String.IndexOf: 00:00:00.3437500
Testing Regex.Match: 00:00:00.0625000

stringLength = 1000; repeatCount = 100000;
Testing String.IndexOf: 00:00:00.9375000
Testing Regex.Match: 00:00:00.1562500

stringLength = 10000; repeatCount = 100000;
Testing String.IndexOf: 00:00:10.6562500
Testing Regex.Match: 00:00:01.2031250

That's about 5-10 times faster.

> The way I
> see it, String.IndexOf *should* be faster, as it's a more blunt
> instrument - it doesn't need to work out anything clever about what the
> regex is doing, etc.

Wrong. RegEx's use the Boyer-Moore algorithm, which does take a little
time&space for initialization, but is magnitudes faster in matching. Also,
they can be compiled, in which case they whould have to be compared to a
matching routine specially built for finding one string.

> Using RegexOptions.Compiled did (on a single-test basis :) seem a bit
> faster, but that has implications in terms of memory use. (It's fine if
> you've got a static search string, but no good otherwise.)

Yes, that's why it's an option. You're not supposed to use it if you got new
regular expressions all the time.

> Of course,
> using regular expressions is likely to generate more garbage (if only
> the Match instance) than String.IndexOf (which really shouldn't be
> generating any garbage at all, I'd have thought).

If the regex doesn't contain any capturing paranthesis, it shouldn't
generate more than the Match object either (plus the initialization data
when the RegEx object is created).

> Personally I'd use IndexOf just for clarity's sake, to be honest.

I do too, if speed doesn't matter.
But again, that wasn't the OP's question.

Niki



Relevant Pages

  • Re: Fastest way to search a string for the occurance of a word??
    ... but the OP's question was what's the "Fastest way to search a string ... in all the tests I did here, the Regex was by far superior. ... However, of course, if you've got new regular expressions all ... Sure - but just that extra Match object could be relevant if the search ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: regular expression help
    ... Basically because if you remove everything that is optional in the regex below you end up with an empty regex: ... So the regex engine will try to match on every character in the string: ... , comma doesn't match, but the nothingness in front of it does. ... A quote followed by any sequence of characters that is not a quote, ...
    (microsoft.public.dotnet.framework)
  • Re: Regex optimization
    ... I was hoping that someone with knowledge of the Regex engine could ... match per string for either Regex. ... reluctant modifier, may be slower .*?, +? ... Variable parts will try to capture as much as possible. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Regex Capture problem
    ... "learned" my regex using a freeware utility that had slightly different ... was trying to capture instead of. ... I have used Regex utilities before, so I understand the concepts of text ... Function RESub(str As String, SrchFor As String, ReplWith As String) As String ...
    (microsoft.public.excel.programming)
  • Re: Trim a multiple line message to a single line
    ... You can do this quite easily with either a regex or a simple function I'll try to demonstrate both: ... private string LayoutInput ... Could you send a sample file with two of these data blocks f what ...
    (microsoft.public.dotnet.languages.csharp)