Re: Tough (for me) regex case
From: Rob Perkins (rob_perkins_at_hotmail.com)
Date: 04/13/04
- Next message: nospam_at_CrLf.com: "ListBox SelectionMode Bug"
- Previous message: Gary Hunt: "Re: Problem with pages talking between two servers"
- Maybe in reply to: Rob Perkins: "Tough (for me) regex case"
- Next in thread: Matt Garrish: "Re: Tough (for me) regex case"
- Reply: Matt Garrish: "Re: Tough (for me) regex case"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 13 Apr 2004 18:25:12 GMT
[x-posted to m.p.d.f because it concerns the .NET Framework's regex-er
as well...]
"Matt Garrish" <matthew.garrish@sympatico.ca> wrote:
>Does it make a little more sense now why Microsoft's implementation is
>wrong?
I'm not ready to call it "wrong", but I'm getting close. OK, so we
start with:
/(?<!")"(?!")(.*?)(?<!")"(?!")/
Removing the lookahead and lookbehind stuff, (in other words, don't
worry about the paired doublequote case) I get a pattern which reads:
/"(.*?)"/
...which includes the quotes in the match, in the .NET implemenation.
In Perl, the quotes get consumed before the match is constructed. But
if I do this:
/".*?"/
Then the regex matches include the quote characters, in either
implementation. So apparantly in the .NET implementation there is no
semantic difference between the two smaller cases.
And... now it begins to make a bit more sense. One implementor decided
there was no distinction in that difference. Another did.
It makes me wonder if this .NET implementation approach is shared by
other implementations. IOW, is the desirable (for my problem) behavior
unique to Perl 5, or is the undesirable behavior unique to .NET?
TMTOWDI. But it represents a case which works desirably for me under
Perl, and generates a bit more work for me under the .NET Framework's
regex engine.
OK, so that leads me then to a case where this particular regex fails,
even in the Perl implementation. Consider the case of:
The "quick" brown "fox jumped ""over""" the lazy dog.
The desirable matches are:
quick
fox jumped ""over""
but this regex returns only
quick
If I stick whitespace between the second and third quote after "over"
then it returns:
quick
fox jumped ""over""<space>
Again, the plain-english description is "all text between a pair of
doublequote characters, except that paired doublequotes inside a
quoted string are part of the match."
What do you think the regex will be?
Rob
- Next message: nospam_at_CrLf.com: "ListBox SelectionMode Bug"
- Previous message: Gary Hunt: "Re: Problem with pages talking between two servers"
- Maybe in reply to: Rob Perkins: "Tough (for me) regex case"
- Next in thread: Matt Garrish: "Re: Tough (for me) regex case"
- Reply: Matt Garrish: "Re: Tough (for me) regex case"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|