Re: Fastest way to search a string for the occurance of a word??
From: Niki Estner (niki.estner_at_cube.net)
Date: 08/11/04
- Next message: Nicholas Paldino [.NET/C# MVP]: "Re: DTS application"
- Previous message: Etienne Boucher: "Re: re:Redim a String[]"
- In reply to: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Next in thread: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Reply: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 11 Aug 2004 17:37:53 +0200
"Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in
news:MPG.1b8444cfa741b7d398b12b@msnews.microsoft.com...
> ...
> Well, you're using "whatever case we happen to get" performance -
Performing thousands of tests over random data, adding up the results:
That's average performance in my understanding.
You may have read somewhere that quicksort has O(n*log n) performance in the
average, O(nē) in the worst case. Average case = sort random data; Worst
case = sort data that is already sorted.
> without knowing anything about the real world data, we don't know what
> "average-case" performance is going to be at all.
???
You think you can't say e.g. whether bubblesort on the average performs
better/equally/worse than quicksort without knowing what data will be
sorted?
They do teach that topic differently where I come from...
> For instance, where
> did you get "1234" from? The results change very dramatically if you
> change it to "123" for instance, presumably because "123" is found
> fairly early in the string, which is where String.IndexOf does better.
> (Searching for "123" gives results of 2.89s for String.IndexOf, and
> 4.52s for regular expressions, for a million iterations (one test
> only.))
Again, on my PC, Regex is 5 times faster.
BTW: I've tested that on my laptop, too. Same results.
> ...
> I'm not sure that "corrected" is the right word for what I did -
> certainly "commented on".
Then I must have misunderstood the sentence "I usually find the exact
opposite". To me that sounded a lot like a correction...
> ...
> If I'd had the same results you'd had, I would probably have
> recommended REs, yes - but definitely with a caveat that it's worth
> checking that this really is the performance bottleneck, as IndexOf is
> more readable and in most cases has "good enough" performance. When
> advocating any "more complex" code for performance reasons, I tend to
> put such a caveat in. (I consider regular expressions more complex as
> you need to be careful about the search string - if it contains any
> characters that are "special" for regexes, you may need to escape them,
> for instance.)
That *was* my advice. I *did* tell him to find out what's the bottleneck of
his app, any do a few performance tests if string matching turns out to be
that bottleneck.
> Having looked at my reply to you, btw, I noticed I'd used the phrase
> "hard-coded", and so I thought I'd apply that to our particular test. I
> assume this is pretty much what Boyes-Moore does:
> ...
>
> (I think that's correct, isn't it?)
I think it should read:
static int Find1234 (string haystack)
{
int index=3;
int max = haystack.Length-3;
while (index < max)
{
switch (haystack[index])
{
case '1':
if ((haystack[index+1] == '2') &&
(haystack[index+2] == '3') &&
(haystack[index+3] == '4'))
return index;
break;
case '2':
if ((haystack[index-1] == '1') &&
(haystack[index+2] == '3') &&
(haystack[index+3] == '4'))
return index;
break;
case '3':
if ((haystack[index-2] == '1') &&
(haystack[index-1] == '2') &&
(haystack[index+3] == '4'))
return index;
break;
case '4':
if ((haystack[index-3] == '1') &&
(haystack[index-2] == '2') &&
(haystack[index-1] == '3'))
return index;
break;
}
index+=4;
}
return -1;
}
On my PC, that's about 20-30% faster than a compiled Regex.
But I guess if the string was variable, and the function had to use
table-lookups instead of constants, it would probably be slower.
> I haven't tried to optimise it any further, but it already just about
> out-performs the other ways in a few test cases. (That's not using
> compiled reg-exes, admittedly - just your original test code.)
Yep, that was why I suggested that in my original reply.
Niki
- Next message: Nicholas Paldino [.NET/C# MVP]: "Re: DTS application"
- Previous message: Etienne Boucher: "Re: re:Redim a String[]"
- In reply to: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Next in thread: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Reply: Jon Skeet [C# MVP]: "Re: Fastest way to search a string for the occurance of a word??"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|