Re: Replace strings in a text file and get the number of replacementsmade
- From: "Peter Duniho" <no.peted.spam@xxxxxxxxxxxxxxxxxx>
- Date: Tue, 07 Jul 2009 15:48:44 -0700
On Tue, 07 Jul 2009 14:45:02 -0700, Jesse Houwing <jesse.houwing@xxxxxxxxxxxxxxxx> wrote:
Looping through the string in a stringbuilder is probably the safest way to do this:
[...]
int count = 0;
for (int i = 0; i < sb.Length; )
{
if (AreEqual(sb, search, i))
{
sb.Remove(i, search.Length);
sb.Insert(i, replacement);
i += replacement.Length;
count++;
}
else
{
i++;
}
}
[...]
It might be faster to use sb.Replace(search, replacement, i, search.Length)
instead of a sb.Remove, sb.Insert, I'm not sure, but they won't differ that much.
If you simply call StringBuilder.Replace(string, string, int, int) instead of having your own AreEqual() method followed by a call to Remove() and Insert(), the performance should be practically identical, but you wouldn't get any information about how many replacements occurred.
Alternatively, if you still call AreEqual() and then call StringBuilder.Replace(string, string, int, int), you're duplicating effort (which costs performance), because StringBuilder.Replace(string, string, int, int) has to actually do the string comparison again. That would actually be _slower_ than your original code.
You could do a little hack by searching for the first character that differs between the search and replacement strings (as an initialization, not as part of the loop), and then bumping a counter after each call to StringBuilder.Replace() based on whether the character at the same offset within the current StringBuilder has changed. That would be only slightly slower than just calling StringBuilder.Replace(string, string, int, int), but would include the count.
That said, I would hope that any Replace() method in .NET, including Regex.Replace(), String.Replace(), or StringBuilder.Replace() would be faster than the code you posted. The main reason being that all of those methods have the opportunity to optimize the construction of the new string, whereas your example doesn't optimize at all.
At the very least, I would not use the Remove()/Insert() pattern you've shown. Instead, I would use a String as input, and a StringBuilder as output, appending text segments to the output StringBuilder as I scan the input String. That way the code avoids having to repeatedly shift your character buffer in the StringBuilder (which happens _twice_ for each replacement in your code). That's exactly the kind of optimization I'd expect to find inside the .NET classes.
It might even be worthwhile to defer creation of the output StringBuilder until you detect the first match that needs to be replaced, if there's an expectation that for a significant frequency of input, no replacements would be needed.
It is probably a lot faster than using a regex, though I haven't done any measurements.
I would expect Regex to be on par with other explicit mechanisms like that, especially given the need to count the replacements (which for non-Regex solutions requires replacing the search text twice). If performance is an issue, then a "scan and build" approach as I suggest above is probably slightly faster than using built-in Replace() methods simply because you can incorporate a count into the replacement logic.
All that said, if performance is an issue (and there's nothing in the OP to suggest it is), the only way to know for sure what the best solution is would be to try the different alternatives and measure them. Even theoretical advantages and disadvantages may be irrelevant for a typical data set, and intuition is a terrible way to measure performance. :)
For best performance, it may be that none of the suggestions offered so far are probably suitable. There's an optimized text search algorithm, the name of which I can't recall at the moment, that can probably be adapted, but if not then a degenerate state-graph implementation (since there's only one string to search for) would probably work too. Either approach would avoid having to keep performing full string comparisons at each character index in the original string (consider an original string "aaaaaaaaaaaaaaaaaaaaaaaaaa" where you want to replace all occurrences of "aaaaaab" with something :) ).
But even there, as I said, there's no way to know for sure without measuring. Performance of the various choices is to some extent going to be data dependent; liabilities that exist in the general case might not really be that much of a problem. For example, if dealing with essentially random data, it's not too terrible to just keep comparing over and over at an incremented index, because those comparisons will normally terminate quickly when there's no match.
In other words, even the theoretically worst-case implementation might not turn out to be much different than the more optimized ones.
Until there's a performance problem shown, the OP should stick with whatever solution _reads_ the best, and is the most maintainable. And if there is a performance problem shown, measuring each viable alternative is the only way to know for sure which will be fastest.
Pete
.
- Follow-Ups:
- Re: Replace strings in a text file and get the number ofreplacementsmade
- From: Jesse Houwing
- Re: Replace strings in a text file and get the number ofreplacementsmade
- References:
- Re: Replace strings in a text file and get the number of replacements made
- From: Peter Duniho
- Re: Replace strings in a text file and get the number of replacementsmade
- From: Jesse Houwing
- Re: Replace strings in a text file and get the number of replacements made
- Prev by Date: Re: Replace strings in a text file and get the number of replacementsmade
- Next by Date: Need advice on Class. Thank You
- Previous by thread: Re: Replace strings in a text file and get the number of replacementsmade
- Next by thread: Re: Replace strings in a text file and get the number ofreplacementsmade
- Index(es):
Relevant Pages
|