Re: Efficient regular expression pattern ?
- From: Barry Kelly <barry.j.kelly@xxxxxxxxx>
- Date: Wed, 14 Jun 2006 14:18:49 +0100
"Steve B." <steve_beauge@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
I'm building an application that analyse a flow of url in order to detect
some pages.
I've a very huge list of regular expressions (up to several thousands) that
I have to check on all urls.
Can you show us some of your regular expressions? Are many of them
simple text matches, or do they all involve wildcards? Are they using ()
which end up capturing when they don't need to (consider the
ExplicitCapture option)? Do you use the Compiled option?
It can be quite easy to write regular expressions with exponential
matching behaviour. For example, an RE which is supposed to match the
whole string and looks like ".*/foo/" will be much slower than
"^.*/foo/". Also, the current .NET regular expression matcher doesn't
perform some optimizations that many other matchers do.
Each regular expression will be evaluated agains all urls.
How can I write the code in order to be the most efficient ?
Profile it to find out what bits are slow. You might consider trying
each regular expression on a test set of URL data, and dumping out which
ones are matching slowest. That'll tell you which ones you need to
optimize.
But I'm afraid this process is quite slow since urls can input up to
10/seconds.
How slow is it currently?
I imagine you should be able to check many tens of thousands of regular
expressions per second.
-- Barry
--
http://barrkel.blogspot.com/
.
- Follow-Ups:
- Re: Efficient regular expression pattern ?
- From: Steve B.
- Re: Efficient regular expression pattern ?
- References:
- Efficient regular expression pattern ?
- From: Steve B.
- Efficient regular expression pattern ?
- Prev by Date: Efficient regular expression pattern ?
- Next by Date: Re: Efficient regular expression pattern ?
- Previous by thread: Efficient regular expression pattern ?
- Next by thread: Re: Efficient regular expression pattern ?
- Index(es):