Re: Fast String Case Insensitive IndexOf

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Justin Rogers (Justin_at_games4dotnet.com)
Date: 05/26/04


Date: Wed, 26 May 2004 16:09:44 -0700

One final note:

IndexOf on a ToUpper performs at 380-400 ticks
Brute at 150-170 ticks
CultureInvariant CompareInfo comes in at 620-650 ticks
Forward Search (my algorithm going forward through the string) at 100-130 ticks
Backward Search (my algorithm going backward through the string) at 10-20 ticks

TickCount appears to be the incorrect timing metric, but I've stuck with it for
completeness.
Normally, I try to use DateTime.Now and work the loop count so that the fastest
algorithm
takes about 1 second. In the case of this testing, scaling the backward search
to a second
would probably mean some very long execution times for the other methods. This
is expected.

-- 
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers
"Justin Rogers" <Justin@games4dotnet.com> wrote in message
news:uykCdV3QEHA.1388@TK2MSFTNGP09.phx.gbl...
> To note, your algorithms are a lot faster if you remove the textBox1.Text
access
>
> If I move the access to above the loop, I get 160 ticks instead of 2500 or so.
> private void button5_Click(object sender, System.EventArgs e)
> {
>     int D = Environment.TickCount;
>     int Cnt = 0;
>     string text1 = textBox1.Text;
>     string text2 = textBox2.Text;
>     for (int I = 0; I < 100000; I++)
>     {
>         if (BruteSearch(text1, text2) > 0) Cnt++;
>     }
>     D = Environment.TickCount - D;
>     listBox1.Items.Add("Brute: " + D.ToString() + " Hits: " + Cnt.ToString());
> }
>
> Also note that you are cheating in your algorithms by going backwards, since
you
> know
> history is at the end of the string.  Here is a method that will go forward or
> backward
> and will outperform your unsafe version.  The results in the end were:
>
> BruteSearch: 170
> BruteSearch2(false): 140
> BruteSearch2(true): 20
>
>         private int BruteSearch2(string source, string sub, bool backwards){
>             if ( sub != null && source != null && sub.Length > 0 && sub.Length
> <= source.Length ) {
>                 char searchChar = MapTable[sub[0]];
>
>                 if (backwards) {
>                     for(int i = (source.Length - sub.Length); i > 0; i--) {
>                         if ( MapTable[source[i]] == searchChar ) {
>                             for(int j = 1; j < sub.Length; j++) {
>                                 if ( MapTable[source[i+j]] !=
MapTable[sub[j]] )
> { goto CONTINUEBCK; }
>                             }
>
>                             return i;
>                         }
>                         CONTINUEBCK:;
>                     }
>                 } else {
>                     for(int i = 0; i < (source.Length - sub.Length + 1); i++)
{
>                         if ( MapTable[source[i]] == searchChar ) {
>                             for(int j = 1; j < sub.Length; j++) {
>                                 if ( MapTable[source[i+j]] !=
MapTable[sub[j]] )
> { goto CONTINUEFWD; }
>                             }
>
>                             return i;
>                         }
>                         CONTINUEFWD:;
>                     }
>                 }
>             }
>
>             return -1;
>         }
>
>
>
>
> -- 
> Justin Rogers
> DigiTec Web Consultants, LLC.
> Blog: http://weblogs.asp.net/justin_rogers
>
> "Justin Rogers" <Justin@games4dotnet.com> wrote in message
> news:O$lPsE3QEHA.4000@TK2MSFTNGP10.phx.gbl...
> > Regular expression state machines can NOT be anywhere near as fast as
> > an optimized IndexOf operation.  Any case-insensitive options you apply
> > will be passed right back to the CultureInfo comparers (slow), the patterns
> > are optimized for generality, and not speed.  They additionally create quite
> > a few objects for storage in the back-end, which will put a hurting on your
> > GC.
> >
> > I think the reason the unsafe code isn't performing is the use of the three
> > fixed statements.  These operations take time (pinning and such), and in
> > many cases you are better off not using unsafe code.  Also, in the example
> > provided, you aren't using the PMapTable pointer, instead you are indexing
> > MapTable as standard array still.
> >
> > There are many ways to make your previously posted method quite a bit
> > faster.  I'll take a look over the next couple of days and try to find out
> where
> > the bottlenecks are based on your test application.
> >
> >
> > -- 
> > Justin Rogers
> > DigiTec Web Consultants, LLC.
> > Blog: http://weblogs.asp.net/justin_rogers
> >
> >
> > "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard.caspershouse.com> wrote in
> > message news:%23QG97j1QEHA.1388@TK2MSFTNGP09.phx.gbl...
> > > Alexander,
> > >
> > >     Have you thought of using a regular expression perhaps?  The .NET
Regex
> > > parser will compile the expressions for speed, and you might see an
> > > improvement if you use this.
> > >
> > >     Hope this helps.
> > >
> > >
> > > -- 
> > >                - Nicholas Paldino [.NET/C# MVP]
> > >                - mvp@spam.guard.caspershouse.com
> > >
> > > "Alexander Muylaert" <Remove_amg@pandora.be> wrote in message
> > > news:eWO8qO1QEHA.808@tk2msftngp13.phx.gbl...
> > > > Hi
> > > >
> > > > I'm still working on a faster alternative for a case insensitive
IndexOf.
> > > I
> > > > need this a lot.  So it better be as fast as it can get.
> > > > My function also ignores accents.
> > > >
> > > > I'm still not very happy but on my CPU
> > > >
> > > > 100,000 string searches
> > > >
> > > > ToUpper.IndexOf    ==> 3,224 ticks
> > > > Culture IndexOf  ==> 2444 ticks
> > > > BruteForce IndexOf ==> 2013 ticks
> > > >
> > > > Is there someone who could even make this function any faster.  Perhaps
a
> > > > less brute algorithm?
> > > > In a real world app this is still way to slow for me.  I rather stick
with
> > > > Managed C#.  I already use Unsafe in my BruteForce function.
> > > >
> > > > Included the test application
> > > >
> > > > Kind regards and many thanks in advance.
> > > >
> > > > Alexander
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>


Relevant Pages

  • Counter Intuitive Results: Optimising an FFT routine for Speed
    ... FFTW 3.0.1 is the latest official version of FFTW ... 20ms ticks and counting in between I could improve resolution to ... algorithm that looks like it came from the Cooley-Tukey paper. ... > It's pretty easy to tell when you've confused the optimizer. ...
    (comp.lang.c)
  • System.Random Algorithm
    ... What´s the algorithm that the .NET System.Random namespace uses? ... Ticks, TickCount, or something else. ...
    (microsoft.public.dotnet.framework)