Re: Regular Expression help

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Okay, you're playing a little fast and loose with your rules, but based upon
my best guess as to what your rules are, let me start with my best guess as
to what the rules are:

First, you state that you want to match "all strings of the form..." This
does not make it clear whether these "strings" have multiple lines in them.
But based upon your syntax, which employs the '^' (Start of string) and '$'
(end of string) characters, I'm going to assume that perhaps you will be
working with multi-lined text (otherwise, at least one of the start of
string and end of string characters would be irrelevant). Based upon that
assumption, my derived rule begins thus:

A match begins at the start of a line, and ends at the end of a line (not
only at the start and end of a string)

Next, your regular expression indicates that any non-line-break character is
a match, *as long as* it is followed by [the string "uses " *not* followed
by the string "a spoon" *but is* followed by some other characters], and
ending with a period.

This breaks down into the following rule set:

1. A match begins at the beginning of a line.
2. It begins with at least one non-line-break character.
3. That sequence of character or characters *must* be followed by the
literal "uses ".
4. The literal "uses " *must not* be followed by the literal "a spoon".
5. The literal "uses " *must* be followed by at least one non-line-break
character.
6. The total sequence *must* end with a period.
7. The period *must* be at the end of a line.

Note that if any of these conditions fail, the entire match fails.

The evaluates to the following regular expression:

(?m)^.+(?=uses (?!a spoon)).+\.$

The first part ("(?m)") indicates that the characters '^' and '$' match at
line breaks, rather then at the beginning and end of a string. You may want
to change this if you're only evaluating a single string, rather than a
series of lines in a single string.

This is followed by '^' (must begin at the beginning of a line or string).

This is followed by '.' with the quantifier '+' (one or more of any
non-line-break character)

This is followed by a positive look-ahead, which states that this must be
followed by the literal "uses ". In addition, the literal "uses " is
followed by a negative look-ahead which prohibits the match if "uses " is
followed by the literal "a spoon". Both conditions must be true in order to
match the first part of the regular expression (must be followed by "uses "
*not* followed by "a spoon").

This is followed by '.' with the quantifier '+' (one or more of any
non-line-break character), meaning that the word "uses " must be followed by
one or more non-line-break characters.

This is followed by "\." (a period, one time), followed by a line break or
end of string.

I tested it with the following (the ones with numbers matched):

blah hoiuyy uses ouhsd7)) u. [1]
d;fj uses a fork uses spoon. [2]
lkjhlkjh uses a spoon.
usese a soon or a fork and is the glliiig.
spen duh a spoon
he uses a spoon.
he uses a fork. [3]
popiu hig spoon uses something. [4]
She uses.
uses a fork.
use a spoon.
He uses a spoon, but also uses a fork. [5]
She uses a spoon, not a fork.
He uses forks, knives, and a spoon. [6]
He uses a fork
or a spoon.

These matched because "uses " is not the first word in the line, is followed
by a sequence of characters that is not "a spoon", and the line ends with a
period.

Some other notable results:

"She uses." The word "uses" is not followed by any characters before the
period.
"uses a fork." The word "uses" is the first word on the line (not preceded
by any characters).
Number 5 succeeds because of the phrase "uses a fork". The phrase "uses a
spoon" is considered as part of the first sequence of "any non-line-break
character". Since this is followed by "uses a fork", that phrase causes the
whole to match.
The last 2 lines do not match because of the line break.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

This is, by definition, not that.

"Zach" <divisortheory@xxxxxxxxx> wrote in message
news:1148774134.328699.244510@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hello,

Please forgive if this is not the most appropriate newsgroup for this
question. Unfortunately I didn't find a newsgroup specific to regular
expressions.

I have the following regular expression.

^(.+?) uses (?!a spoon)\.$

I know something is wrong with this, but essentially what I want to do
is match all strings of the form

^(.+?) uses (.+?)\.$

but only if the text inside that last group is not the string "a
spoon".

When this is tested against the string

Jim uses a fork.

the test fails. It says "Jim uses a fork." does not match the regular
expression "^(.+?) uses (?!a spoon)\.$"

Am I missing something obvious here?

Thanks for any help.



.



Relevant Pages

  • Re: How to convert Infix notation to postfix notation
    ... If this is for an error message, why isn't it using stderr for its output? ... array of 15 characters, and you call this function with the limit 15 on ... Making sure that the only string I allocate and append to, ... because mulFactor in all versions must needs incorporate the functions ...
    (comp.lang.c)
  • Re: Prothon should not borrow Python strings!
    ... """It does not make sense to have a string without knowing what encoding ... same cul de sac as Python. ... Prothon_String_As_ASCII // raises error if there are high characters ... Python's split between byte strings and Unicode strings is ...
    (comp.lang.python)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... put them in stupid places. ... Programming is difficult (as you must surely appreciate, ... > strings will be in the range 1...1000 characters. ... impose an artificially small limit on string length." ...
    (comp.programming)
  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: A note on personal corruption as a result of using C
    ... impossible to write effective string validation routines by definition ... (Note that a string literal may contain embedded null characters; ... without resorting to abusive language. ... In practice, programmers typically use "struct" ...
    (comp.programming)