Re: Fast way to determine if a string contains a member of a list of strings

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



WOW! Really disappointing timings for the state machine method. Here are the
results:

Method: Looping through 359 strings doing an if (IndexOf > 0) and a source
string of length = 47
Execution time: 0.055008 seconds.


Method: Using the StateMachine method on 359 strings and a source string of
length = 47
Execution time: 2.359223 seconds.

In my case, I can terminate the search as soon as ONE of the strings is
found because the list is mutually exclusive. In the sample code that you
provided, at what point do I know I have a match and can terminate the
traversal?

Also, to answer your question, I do not have space delimiters - the strings
can span multiple words, etc.

"Peter Duniho" <NpOeStPeAdM@xxxxxxxxxxxxxxxx> wrote in message
news:op.t7lw7dew8jd0ej@xxxxxxxxxxxxxxxxxxxxxxx
On Thu, 06 Mar 2008 10:08:22 -0800, Karch <news.microsoft.com> wrote:

I think the StateGraph approach is going to work best. I have an
implementation up and running and should be able to get some timings done
today.

Have you tried the dictionary approach suggested by a couple of others?

I wasn't paying attention when I first read your question and failed to
note that your search is delimited by spaces. Or, at least in the
examples you provided it was.

I think that if your data is actually restricted like that, the dictionary
lookup might be as fast as a state graph and it would be a lot simpler in
terms of implementation.

Just a thought. Obviously I really like state graphs :), but when there
is information you know about the input data that allows you to constrain
the problem a bit (e.g. by using spaces to identify the beginning and
ending of any possible match), it's often possible to solve the problem
efficiently without using a state graph.

Pete


.



Relevant Pages

  • Re: Ada Shootout program for K-Nucleotide (patches)
    ... execution time. ... and this particular hot spot had been cooled down twice: ... constrained strings of suitable fixed length ... function and a cute string equality function. ...
    (comp.lang.ada)
  • Re: Ada Shootout program for K-Nucleotide (patches)
    ... Ludovic Brenta wrote: ... execution time. ... constrained strings of suitable fixed length ... Operator subprograms seem to confuse the profiling programs, ...
    (comp.lang.ada)
  • Re: Faster datastructure for lookups wanted
    ... My code spends most of its execution time doing lookups from ... The keys are strings and the values ... I do not care how long the construction of the datastructure takes, ... Maybe it is worth trying symbols instead of strings? ...
    (comp.lang.ruby)
  • Re: efficiency question - nested function cals
    ... On 10,000 iterations the execution time is virtually the ... good* reason not to. ... bottleneck in your code is due to one micro-optimisation. ... Even that decision is only relevant when you know the number of strings ...
    (microsoft.public.dotnet.languages.csharp)