Re: Fastest way to search a string for the occurance of a word??
From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 08/11/04
- Next message: Joris Dobbelsteen: "Re: AutoResetEvent doesn't guarentee order"
- Previous message: Tim Gallivan: "Re: Traceroute problems with Sockets"
- In reply to: Niki Estner: "Re: Fastest way to search a string for the occurance of a word??"
- Next in thread: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Reply: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Reply: Niki Estner: "Re: Fastest way to search a string for the occurance of a word??"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 11 Aug 2004 13:40:05 +0100
Niki Estner <niki.estner@cube.net> wrote:
> > Well, most of the kinds of tests I was talking about are doing more
> > than just IndexOf - things like checking for a number being (probably)
> > parseable, etc.
>
> Maybe, but the OP's question was what's the "Fastest way to search a string
> for the occurance of a word?".
True - but you did use the word "generally" in your first post, which
was what I was replying to.
> > The test case you gave showed roughly the same results for both methods
> > - sometimes faster for IndexOf, sometimes faster for Regex.
>
> Really? in all the tests I did here, the Regex was by far superior.
<snip results>
Interesting - entirely different from my results. I would suggest that
such short tests aren't a very good measure, however. I used a string
length of 10000 and a repeat count of 300000 to get reasonable results.
(That's using a random seem of 10, btw - using a fixed seed ensures
reasonably repeatable results.)
Using only results of 5 seconds or more, I never saw either method get
more than about a 20% advantage over the other - and even that was only
after making the regular expression compiled. I wonder what the
difference is that's making your regex implementation so much faster?
One thing which makes a *huge* difference is if you make sure that
there's a match, and make it early in the string. At that stage,
String.IndexOf works very quickly compared with Regex. (Try appending
the search string to the StringBuilder before doing the rest of the
appends and you'll see what I mean.)
> > The way I
> > see it, String.IndexOf *should* be faster, as it's a more blunt
> > instrument - it doesn't need to work out anything clever about what the
> > regex is doing, etc.
>
> Wrong. RegEx's use the Boyer-Moore algorithm, which does take a little
> time&space for initialization, but is magnitudes faster in matching.
I haven't studied that for a while (and the web page I've found
describing it is too dense to take in during a lunch break :) but I
thought it only really helped if there were repeated characters within
the text being searched for?
> Also, they can be compiled, in which case they whould have to be compared to a
> matching routine specially built for finding one string.
Yup (hence my comment in the previous post).
> > Using RegexOptions.Compiled did (on a single-test basis :) seem a bit
> > faster, but that has implications in terms of memory use. (It's fine if
> > you've got a static search string, but no good otherwise.)
>
> Yes, that's why it's an option. You're not supposed to use it if you got new
> regular expressions all the time.
Indeed. However, of course, if you've got new regular expressions all
the time, you shouldn't be testing with a single fixed regular
expression which is precreated, which is what your test does. Depending
on what the OP's actual use will be, he should either specify
RegexOptions.Compiled, or move the Regex constructor call to the inside
of the loop when testing.
> > Of course,
> > using regular expressions is likely to generate more garbage (if only
> > the Match instance) than String.IndexOf (which really shouldn't be
> > generating any garbage at all, I'd have thought).
>
> If the regex doesn't contain any capturing paranthesis, it shouldn't
> generate more than the Match object either (plus the initialization data
> when the RegEx object is created).
Sure - but just that extra Match object could be relevant if the search
itself is fast (and generates a match, of course - otherwise
Match.Empty will be returned all the time).
> > Personally I'd use IndexOf just for clarity's sake, to be honest.
>
> I do too, if speed doesn't matter.
> But again, that wasn't the OP's question.
I still would if speed was *generally* important but not the absolutely
dominating factor. Basically, I'd suggest that the OP uses IndexOf and
checks to see whether this is actually a significant part of the time
taken per request.
-- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
- Next message: Joris Dobbelsteen: "Re: AutoResetEvent doesn't guarentee order"
- Previous message: Tim Gallivan: "Re: Traceroute problems with Sockets"
- In reply to: Niki Estner: "Re: Fastest way to search a string for the occurance of a word??"
- Next in thread: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Reply: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Reply: Niki Estner: "Re: Fastest way to search a string for the occurance of a word??"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|