Re: Regular Expression, to use or not to use...

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Niki Estner (niki.estner_at_cube.net)
Date: 08/20/04


Date: Sat, 21 Aug 2004 01:10:29 +0200


"Tom" <junkmale48@hotmail.com> wrote in
news:63d1c9f0.0408201000.6324ef03@posting.google.com...
> I have struggled with the issue of whether or not to use Regular
> Expressions for a long time now, and after implementing many text
> manipulating solutions both ways, I've found that writing specialized
> code instead of an RE is almost always the better solution. Here is
> why....

> RE's are complex.

Yesterday somebody asked a string matching question of the C# ng. I've
copied two of the answers here:
The C# version:

   // The start Position
   int Start = InputName.IndexOf(":")+1;
   // The end position
   int End = InputName.IndexOf(" T");
   // The output
   string OutputName = InputName.Substring(Start,End - Start);
   // Clear off leading / trailing spaces
   OutputName = OutputName.Trim();

The Regex version:
   Match m = Regex.Match(inputString, "From: (.*) To:");
   if (m.Success)
     OutputName = m.Groups[1];

I may be a bad C# programmer, but if I read the former piece of source code
I wouldn't have a clue of what it could be doing. Although every line's
commented, and the code is rather simple.
The RE version on the other hand seems pretty self-explaining to me.

> Plus it is not like writing this "one line" of code is saving you any
time.

It does.
Look at the above code snippets and think about how much time the poor
plain-C#-guy will spend debugging, trying to figure out why people like
"James T. Kirk" don't get their email, and why his program sometimes simply
crashes...

> I have
> had really complex RE's that took me hours to write that I needed
> external tools to help me with. Sure it was only one line of code in
> the end, but I could have written a page of easy to follow code in
> half the time. Which brings me to my next point.

Interesting. Could you post such a RE, and the plain code?

> RE's are not debug-able. If I have a page of well written code I (or
> anyone else) can easily step through it.

Putting it in a RegEx testing program like Expresso and removing parts of it
usually does a similar job.

> When I send my 150 char RE
> into the RE engine it is a black box. I am just left to wonder why it
> didn't work.

On the other hand, you usually can't copy C# code to some other program and
simply run it there, because it is coupled with other parts of the project.

> ...
> It's difficult to do some things in RE. RE's work great for some
> search but not so well for others.

That applies to pretty much any tool I can think of...

> ...
> Finally I don't know what you guy are saying about RE ever being
> faster ever. In my experience RE's are slow, very slow. Like on the
> order of 10 times slower then straight forward string parsing code.

Depends on what you're doing, and how you're doing it. If you are searching
for a long pattern like "Thomas Jefferson" in a long string (> 100
characters), the RE is about 10 times faster than a culture-invariant
IndexOf.

> For me this is the final kicker. Originally I went through all the
> trouble off using RE's in all my complex text paring code because I
> thought it would faster. This could be a result of .NET engine,
> regardless I was really disappointed by this causing me to rollback a
> lot of my solutions.

If it's in the 10% of the code that use 90% of the time, you should of
course choose the fastest code you can get.

> With all this said I still use REs sometimes, but only for really
> simple string operations where I can come up with the expression in a
> couple of seconds I don't care about performance and don't feel like
> writing 5-10 lines of code.

So what?

> To me it is kind of like a quick hack.

A quick hack with full error handling, that's usually much more stable and
robust that those untested "5-10 lines of code"...

Niki



Relevant Pages

  • Re: Regular Expression, to use or not to use...
    ... So I have something that will search the string 10x ... Yea thats true, like I said I still use them occasionally, as a hack, ... Also this is an extremly simple re, no |'s or complex expressions. ... >> simple string operations where I can come up with the expression in a ...
    (microsoft.public.dotnet.general)
  • Re: Small confusion about negative lookbehind
    ... > My candidate string is "ab". ... > The expressions I'm testing this string against are the following, ... but the position between characters. ... Regular expressions describe not only strings, ...
    (comp.lang.java.programmer)
  • Re: Why not FP for Money?
    ... >> conversion of binary floats to decimal floats, and the string looks ... >> out of place in numeric expressions. ... > that using 'd' is a compromise to having no way to write ... Carlos Ribeiro ...
    (comp.lang.python)
  • RE: speed up string matching
    ... > I need to match an expression and its reverse to a very long string. ... you'd have to merge your expressions somehow - the easiest ... So in order to match a very long string with multiple expressions simultaneously and faster than the matching procedure I have described above I need multiple computers? ...
    (perl.beginners)