Re: Fast String Case Insensitive IndexOf
From: Justin Rogers (Justin_at_games4dotnet.com)
Date: 05/26/04
- Next message: I.P. Freely: "Access hardware directly via C#?"
- Previous message: Alex Moskalyuk: "Re: Chinese Syntax"
- In reply to: Justin Rogers: "Re: Fast String Case Insensitive IndexOf"
- Next in thread: William Stacey [MVP]: "Re: Fast String Case Insensitive IndexOf"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 26 May 2004 16:09:44 -0700
One final note:
IndexOf on a ToUpper performs at 380-400 ticks
Brute at 150-170 ticks
CultureInvariant CompareInfo comes in at 620-650 ticks
Forward Search (my algorithm going forward through the string) at 100-130 ticks
Backward Search (my algorithm going backward through the string) at 10-20 ticks
TickCount appears to be the incorrect timing metric, but I've stuck with it for
completeness.
Normally, I try to use DateTime.Now and work the loop count so that the fastest
algorithm
takes about 1 second. In the case of this testing, scaling the backward search
to a second
would probably mean some very long execution times for the other methods. This
is expected.
-- Justin Rogers DigiTec Web Consultants, LLC. Blog: http://weblogs.asp.net/justin_rogers "Justin Rogers" <Justin@games4dotnet.com> wrote in message news:uykCdV3QEHA.1388@TK2MSFTNGP09.phx.gbl... > To note, your algorithms are a lot faster if you remove the textBox1.Text access > > If I move the access to above the loop, I get 160 ticks instead of 2500 or so. > private void button5_Click(object sender, System.EventArgs e) > { > int D = Environment.TickCount; > int Cnt = 0; > string text1 = textBox1.Text; > string text2 = textBox2.Text; > for (int I = 0; I < 100000; I++) > { > if (BruteSearch(text1, text2) > 0) Cnt++; > } > D = Environment.TickCount - D; > listBox1.Items.Add("Brute: " + D.ToString() + " Hits: " + Cnt.ToString()); > } > > Also note that you are cheating in your algorithms by going backwards, since you > know > history is at the end of the string. Here is a method that will go forward or > backward > and will outperform your unsafe version. The results in the end were: > > BruteSearch: 170 > BruteSearch2(false): 140 > BruteSearch2(true): 20 > > private int BruteSearch2(string source, string sub, bool backwards){ > if ( sub != null && source != null && sub.Length > 0 && sub.Length > <= source.Length ) { > char searchChar = MapTable[sub[0]]; > > if (backwards) { > for(int i = (source.Length - sub.Length); i > 0; i--) { > if ( MapTable[source[i]] == searchChar ) { > for(int j = 1; j < sub.Length; j++) { > if ( MapTable[source[i+j]] != MapTable[sub[j]] ) > { goto CONTINUEBCK; } > } > > return i; > } > CONTINUEBCK:; > } > } else { > for(int i = 0; i < (source.Length - sub.Length + 1); i++) { > if ( MapTable[source[i]] == searchChar ) { > for(int j = 1; j < sub.Length; j++) { > if ( MapTable[source[i+j]] != MapTable[sub[j]] ) > { goto CONTINUEFWD; } > } > > return i; > } > CONTINUEFWD:; > } > } > } > > return -1; > } > > > > > -- > Justin Rogers > DigiTec Web Consultants, LLC. > Blog: http://weblogs.asp.net/justin_rogers > > "Justin Rogers" <Justin@games4dotnet.com> wrote in message > news:O$lPsE3QEHA.4000@TK2MSFTNGP10.phx.gbl... > > Regular expression state machines can NOT be anywhere near as fast as > > an optimized IndexOf operation. Any case-insensitive options you apply > > will be passed right back to the CultureInfo comparers (slow), the patterns > > are optimized for generality, and not speed. They additionally create quite > > a few objects for storage in the back-end, which will put a hurting on your > > GC. > > > > I think the reason the unsafe code isn't performing is the use of the three > > fixed statements. These operations take time (pinning and such), and in > > many cases you are better off not using unsafe code. Also, in the example > > provided, you aren't using the PMapTable pointer, instead you are indexing > > MapTable as standard array still. > > > > There are many ways to make your previously posted method quite a bit > > faster. I'll take a look over the next couple of days and try to find out > where > > the bottlenecks are based on your test application. > > > > > > -- > > Justin Rogers > > DigiTec Web Consultants, LLC. > > Blog: http://weblogs.asp.net/justin_rogers > > > > > > "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard.caspershouse.com> wrote in > > message news:%23QG97j1QEHA.1388@TK2MSFTNGP09.phx.gbl... > > > Alexander, > > > > > > Have you thought of using a regular expression perhaps? The .NET Regex > > > parser will compile the expressions for speed, and you might see an > > > improvement if you use this. > > > > > > Hope this helps. > > > > > > > > > -- > > > - Nicholas Paldino [.NET/C# MVP] > > > - mvp@spam.guard.caspershouse.com > > > > > > "Alexander Muylaert" <Remove_amg@pandora.be> wrote in message > > > news:eWO8qO1QEHA.808@tk2msftngp13.phx.gbl... > > > > Hi > > > > > > > > I'm still working on a faster alternative for a case insensitive IndexOf. > > > I > > > > need this a lot. So it better be as fast as it can get. > > > > My function also ignores accents. > > > > > > > > I'm still not very happy but on my CPU > > > > > > > > 100,000 string searches > > > > > > > > ToUpper.IndexOf ==> 3,224 ticks > > > > Culture IndexOf ==> 2444 ticks > > > > BruteForce IndexOf ==> 2013 ticks > > > > > > > > Is there someone who could even make this function any faster. Perhaps a > > > > less brute algorithm? > > > > In a real world app this is still way to slow for me. I rather stick with > > > > Managed C#. I already use Unsafe in my BruteForce function. > > > > > > > > Included the test application > > > > > > > > Kind regards and many thanks in advance. > > > > > > > > Alexander > > > > > > > > > > > > > > > > > > > > > > > >
- Next message: I.P. Freely: "Access hardware directly via C#?"
- Previous message: Alex Moskalyuk: "Re: Chinese Syntax"
- In reply to: Justin Rogers: "Re: Fast String Case Insensitive IndexOf"
- Next in thread: William Stacey [MVP]: "Re: Fast String Case Insensitive IndexOf"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|