Re: Search for multiple things in a string



tshad <tscheiderich@xxxxxxxxxxxxxxx> wrote:
> > Regular expressions have nothing to do with ASP.NET - they're a part of
> > "normal" .NET.
>
> Actually, you're right.
>
> But that was my point.
>
> Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
> So using Regex is not really like using another language (as C# is different
> from VB.Net).

It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.

> But the discussion was valid in you use the best tool for the situation.

Indeed.

> >> As far as readability, it has nothing to do with Regular Expressions
> >> whether
> >> it is readable or not, as Oliver mentions, but how you write it.
> >
> > No - I believe that searching for "jon.skeet" with IndexOf is clearer
> > than searching for "jon\\.skeet" or @"jon\.skeet".
>
> That's maybe true. But it would be clear to someone used to using both C#
> and Regex.

But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?

> Also, you have the same problem when dealing with web pages or getting a
> file from the disk. You still use the escape character there (and as you
> say, is a little confusing) - but you still do it.

You have to know the C# escaping, but not the regular expression
escaping.

> >> You can also make some pretty unreadable C# code as well.
> >
> > Sure, but that's no reason to use regular expressions just to make
> > things worse.
>
> I agree with you that readability is important.
>
> It used to be that people didn't like C and C++ for exactly the same reason
> you point out. The code was not as clear as COBOL or Basic and that was the
> complaint back then. I happened to be a Fortran programmer at that time and
> was not interested to moving to C for that reason (not that Fortran was
> better - readability wise).
>
> The problem with C back that was that even though much of the code was
> really cryptic. But it didn't have to be, that was just how people coded
> back then. Mainly, it was important to make the most efficient code
> possible because of the limited computing power and efficient rarely equates
> to readable. And I am not even talking about compiling and linking and all
> the options and cryptic command lines.

To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.

> > Yes, but it's the programmer's decision how to approach things -
> > whether you do things the simple way or the complex way. You *could*
> > implement the string search by manually iterating over all the
> > characters in the string, perhaps even writing your own state machine
> > to do it. The code could be pretty readable considering what it's doing
> > - but it's *bound* to be more complex than using IndexOf.
>
> I agree.
>
> Just because you can - doesn't mean you should.

Exactly.

> > Sure - but why introduce unnecessarily complexity? You're already
> > writing C#, so you'd better know C# - but why add regular expressions
> > into the mix when they're unnecessary?
>
> But if you know both and as I (and you) mentioned regex is part of .net as
> is C# - so it is already in the mix.

No, it's not. It's not already used in every single C# program, any
more than SQL is.

> But you're right, don't introduce any
> more complexity that necessary. But if it's 6 of one ... it's really up to
> the programmer.

In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.

> In the original case, that was what it was. You can't tell
> me that you feel that the solution suggested for this case was even close to
> being unreadable (if you are even a stones throw from understanding Regular
> Expressions).

It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.

> I personally feel that both solutions are equally usable and readable (in
> this situation).

I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?

> I have also seen times when I just couldn't find an easy solution in C# or
> VB and it was fairly easy in Regex.

Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.

> I myself would usually opt for the C# or VB solutions first, but would have
> no problem using Regex. As a matter of fact, I use Regex to strip commas
> and $ from my textbox fields before writing it to SQL as it was the best
> solution I could find. Such as:
>
> SalaryMax.Text =
> String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))
>
> At the time, I couldn't seem to find as simple a solution as this in VB.Net
> so I use this (not saying there isn't one).

And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
.Replace(",", ""));

I know which version I'd rather read...

> > And that's the point - I don't think this problem *does* warrant it.
>
> I agree that is isn't necessary here, but I don't think it is warranted or
> unwarranted here. I think it's just as readable either way.

But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.

> > So do you add a database when you just need to do a hashtable lookup,
> > just in case you forget SQL? Do you use reflection to get at the value
> > of a property, just in case you forget how to use that? I hope not.
>
> Of course not. But as was mentioned there are times where Regex may be a
> good solution and if you can do it either way, why not.

Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.

> > It's very important to use appropriate technology, rather than using it
> > for the sake of it. (It's one thing to experiment with technology for
> > the sake of it as a learning tool, but I wouldn't do it in production
> > code.)
>
> Right. But Regex is not inappropriate technology. As you said, trying to
> loop through each character when there is an easier way is a bit much.

As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.

> But Regex is valid and is an appropriate method for handling strings and if
> you are as comfortable with one as the other than it isn't inappropriate.
> It's all in how you use it. And I was not saying experiment with it. I was
> saying using it for the sake of staying familier with it. I don't want to
> need to use it and have to figure it out when I need to use it.

Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.

> As you said. Use the appropriate tool. If the appropriate tool is Regex,
> it is going to be d... inconvenient to need it and not know how to use it.

I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.

> Now I am not saying go out and learn every tool out there. But if it is a
> valid tool in your particular environment, and it is available - why would
> you not avail yourself of it?

Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?

--
Jon Skeet - <skeet@xxxxxxxxx>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
.



Relevant Pages

  • Re: Search for multiple things in a string
    ... >> I also feel that Regular Expressions, being an object in asp.net (not ... So using Regex is not really like using another language (as C# is different ... I agree with you that readability is important. ... And I was not saying experiment with it. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Fastest way to search a string for the occurance of a word??
    ... but the OP's question was what's the "Fastest way to search a string ... in all the tests I did here, the Regex was by far superior. ... However, of course, if you've got new regular expressions all ... Sure - but just that extra Match object could be relevant if the search ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: String.replaceAll wont work :s
    ... the String#replaceAllmethod expects a regular expression (regex) ... function (just like in java Strings). ... The second parameter is no regex, but only the String which ... For more information about regular expressions, ...
    (comp.lang.java.help)
  • Re: regex test failing in form validation
    ... If you have debug on you'll see a decent report of what the code is ... The regex for matching correct protocol and TLD for the URL ... This will search for the string "TLDlist". ... neither in strings nor regular expressions. ...
    (comp.lang.javascript)
  • Re: Regex to remove from string
    ... At what cost to readability though? ... Admittedly, a String.Replace(RegEx, String) method would be far more ... it sets up a dependency on the reader understanding ... regular expressions, which I've seen causing issues time and time again ...
    (microsoft.public.dotnet.languages.csharp)

Loading