Re: Replace strings in a text file and get the number ofreplacementsmade
- From: "vanderghast" <vanderghast@com>
- Date: Wed, 8 Jul 2009 07:59:10 -0400
There is a difference between matching and replacing.
Someone can say "ohoh" is matched twice in "ohohoh", once starting at position 0 and once starting at position 2,
but if you speak to replace (consume) it, you have only one possible 'action'.
I haven't tried, but I assume Regex would find 2 matches, while replace will replace just once the pattern.
And again, (InitialStrring.Length-InitialString.Replace(pattern, String.Empty).Length) / pattern.Length is 'safe', as far as I know, for all cases, and use no external loop, to count the number of replacements where will be of pattern into InitialString (by whatever newPattern, which is irrelevant).
Vanderghast, Access MVP
"Jesse Houwing" <jesse.houwing@xxxxxxxxxxxxxxxx> wrote in message news:e5317a7e78012e2d8cbcde395c74afa@xxxxxxxxxxxxxxxxxxxxx
Hello Peter,
Agreed on the readability part, but using regex.replace opens up a new can of worms, which people aren't usually prepared for. Say this search/replace action can be entered from the UI, then adding . or * or { into your search pattern can lead to unexpected behaviour, or worse a regex parse error. The regex will also be expensive, because it will have to be parsed/compiled every time a new pattern is used (and if it is a user defined replacement, that would be more often than not).
So this would have to be extended with a Regex.Escape call first. The same applies for the replacement pattern. Say I want to search $2 and replace it with $0.1 you'd get funny things... ($2.1 actually)... So it isn't just using a different call to get the same results.
That said, I'd opt for an extention method on string and write an efficient version (could use mine as an example) of a Replace method that returns the number of matches. And from that moment on, use that. Just as readable (or even better) and no crazy unexpected regex problems due to not exactly understanding what is involved.
Jesse
On Tue, 07 Jul 2009 14:45:02 -0700, Jesse Houwing--
<jesse.houwing@xxxxxxxxxxxxxxxx> wrote:
Looping through the string in a stringbuilder is probably the safestIf you simply call StringBuilder.Replace(string, string, int, int)
way to do this:
[...]
int count = 0;
for (int i = 0; i < sb.Length; )
{
if (AreEqual(sb, search, i))
{
sb.Remove(i, search.Length);
sb.Insert(i, replacement);
i += replacement.Length;
count++;
}
else
{
i++;
}
}
[...]
It might be faster to use sb.Replace(search, replacement, i,
search.Length)
instead of a sb.Remove, sb.Insert, I'm not sure, but they won't
differ that much.
instead of having your own AreEqual() method followed by a call to
Remove() and Insert(), the performance should be practically
identical, but you wouldn't get any information about how many
replacements occurred.
Alternatively, if you still call AreEqual() and then call
StringBuilder.Replace(string, string, int, int), you're duplicating
effort (which costs performance), because
StringBuilder.Replace(string, string, int, int) has to actually do
the string comparison again. That would actually be _slower_ than
your original code.
You could do a little hack by searching for the first character that
differs between the search and replacement strings (as an
initialization, not as part of the loop), and then bumping a counter
after each call to StringBuilder.Replace() based on whether the
character at the same offset within the current StringBuilder has
changed. That would be only slightly slower than just calling
StringBuilder.Replace(string, string, int, int), but would include
the count.
That said, I would hope that any Replace() method in .NET, including
Regex.Replace(), String.Replace(), or StringBuilder.Replace() would be
faster than the code you posted. The main reason being that all of
those methods have the opportunity to optimize the construction of
the new string, whereas your example doesn't optimize at all.
At the very least, I would not use the Remove()/Insert() pattern
you've shown. Instead, I would use a String as input, and a
StringBuilder as output, appending text segments to the output
StringBuilder as I scan the input String. That way the code avoids
having to repeatedly shift your character buffer in the StringBuilder
(which happens _twice_ for each replacement in your code). That's
exactly the kind of optimization I'd expect to find inside the .NET
classes.
It might even be worthwhile to defer creation of the output
StringBuilder until you detect the first match that needs to be
replaced, if there's an expectation that for a significant frequency
of input, no replacements would be needed.
It is probably a lot faster than using a regex, though I haven't doneI would expect Regex to be on par with other explicit mechanisms like
any measurements.
that, especially given the need to count the replacements (which for
non-Regex solutions requires replacing the search text twice). If
performance is an issue, then a "scan and build" approach as I suggest
above is probably slightly faster than using built-in Replace()
methods simply because you can incorporate a count into the
replacement logic.
All that said, if performance is an issue (and there's nothing in the
OP to suggest it is), the only way to know for sure what the best
solution is would be to try the different alternatives and measure
them. Even theoretical advantages and disadvantages may be
irrelevant for a typical data set, and intuition is a terrible way to
measure performance. :)
For best performance, it may be that none of the suggestions offered
so far are probably suitable. There's an optimized text search
algorithm, the name of which I can't recall at the moment, that can
probably be adapted, but if not then a degenerate state-graph
implementation (since there's only one string to search for) would
probably work too. Either approach would avoid having to keep
performing full string comparisons at each character index in the
original string (consider an original string
"aaaaaaaaaaaaaaaaaaaaaaaaaa" where you want to replace all occurrences
of "aaaaaab" with something :) ).
But even there, as I said, there's no way to know for sure without
measuring. Performance of the various choices is to some extent going
to be data dependent; liabilities that exist in the general case
might not really be that much of a problem. For example, if dealing
with essentially random data, it's not too terrible to just keep
comparing over and over at an incremented index, because those
comparisons will normally terminate quickly when there's no match.
In other words, even the theoretically worst-case implementation might
not turn out to be much different than the more optimized ones.
Until there's a performance problem shown, the OP should stick with
whatever solution _reads_ the best, and is the most maintainable. And
if there is a performance problem shown, measuring each viable
alternative is the only way to know for sure which will be fastest.
Pete
Jesse Houwing
jesse.houwing at sogeti.nl
.
- References:
- Re: Replace strings in a text file and get the number of replacementsmade
- From: Peter Duniho
- Re: Replace strings in a text file and get the number ofreplacementsmade
- From: Jesse Houwing
- Re: Replace strings in a text file and get the number of replacementsmade
- Prev by Date: RE: Label and TextAlign
- Next by Date: Re: C# program creating folder on server/access denied
- Previous by thread: Re: Replace strings in a text file and get the number ofreplacementsmade
- Next by thread: Re: Replace strings in a text file and get the number of replacements made
- Index(es):
Relevant Pages
|
Loading