Re: RegEx: How to ignore the number of whitespaces?
- From: "Kevin Spencer" <unclechutney@xxxxxxxxxxxx>
- Date: Thu, 21 Jun 2007 08:19:16 -0400
Hi Florian,
I must admit your situation is confusing, and I do find the idea of creating
a "simpler" regular expression syntax is likely to bite you eventually, one
way or another, but requirements are requirements, and my job is to help you
solve your problem. So.....
I'm still a little in the dark as to the full scope of what you're doing,
but it may not be necessary to understand the whole thing in order to solve
this particular problem. If I understand you fully, you're looking for a way
to require at least one space between separate character sequences in a
string, but that some of these character sequences may be "marked" as
optional, in which case no white spaces would be necessary.
If so, I believe this can be solved using a conditional expression:
this(?(?=.)\s+)
This is a regular expression "if" conditional statement, which is a regular
expression "if/else" conditional statement without an "else." The syntax of
a regular expression "if/else" conditional statement is:
(?(?=regex)then|else)
This means that when the regular expression is matched, the "then"
expression is used. When not matched, the "else" expression is used. So, in
the following, it means "look for 'this'". If anything follows it, it must
be followed by at least 1 white space character (Otherwise, not).
For optional matches, you would use the optional operator as you've
illustrated before:
(?:this(?(?=.)\s+))?
In the following, "this," "that," or "other" will match in any combination,
as long as it ends in "other":
(?:this(?(?=\s.)\s+))?(?:that(?(?=\s.)\s+))?(?:other)
matches:
other
this other
that other
this other
It does NOT match:
this
this that
--
HTH,
Kevin Spencer
Microsoft MVP
Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
"Florian Haag" <florianhaag@xxxxxxxx> wrote in message
news:%23sBSUA0sHHA.4548@xxxxxxxxxxxxxxxxxxxxxxx
Hi!
Kevin Spencer wrote:
This sounds like the "patterns" are performing the work of regular
expressions, matching character sequences in strings.
That's right.
What I don't
understand is why you want to create a new regular expression syntax
which your users must learn, then convert it to the original, rather
than using the original?
Some 95% of my users won't have any programming experience whatsoever,
or any computer science background. I doubt usual regular expressions
with all its features would be suitable for those unexperienced users.
I'd expect it very hard to explain for example, why they must write \.
and \? instead of simply writing a fullstop or a question mark.
All the more, my "space character problem" would remain, for my users
do not understand why the pattern "personal computer" will only match
"personal computer", but not "personal computer" (two spaces in
between), for it's the same words. At the same time, they'd consider
writing patterns like "personal *computer" (or even
"personal\s*computer") way too unintuitive to use my programme.
That's why I offer another pattern syntax with a very limited set of a
few special characters which denote very few pattern features (optional
pattern parts, alternative pattern parts) and everything else one could
possibly write into a pattern will be evaluated just as it's been input.
Second, what are the limitations of the "arbitrary Unicode
characters?"
Actually, that means all Unicode characters except spaces. By
"arbitrary", I wanted to express that any characters may appear in any
order without any restrictions in a pattern and should match just like
that.
Pardon for not describing it very accurately :-$
Okay, we've discussed "arbitrary," but now you will need to define
the term "marked." As the "patterns" are pure text, the "marks" must
also be text. But what consitutes a "text" character and a "mark"
character, and how do you escape text characters to create marks?
Right - there are a few Unicode characters which have to be escaped
(which were chosen in a way that they don't appear in regular
vocabulary, anyway). These are: \ ( ) [ ] |
If either of these characters is meant to actually be found in the
string, it has to be preceded by a backslash.
Otherwise, pairs of both ( and ) as well as [ and ] "mark" a part of a
pattern.
Within such a marked part, there may be any number (greater than zero)
of alternative patterns, each separated by a | character.
If ( and ) are used to denote the part of the pattern, exactly one of
the alternative patterns must appear in the string.
If [ and ] are used to denote the part of the pattern, at most one of
the alternative patterns must appear in the string.
Such marked parts may be nested to an unlimited depth, that is, each of
the above alternative patterns may contain marked parts of its own.
That should be all about the syntax of my patterns, as they are already
used in version 1 of my programme.
Regards,
Florian
.
- Follow-Ups:
- Re: RegEx: How to ignore the number of whitespaces?
- From: florianhaag
- Re: RegEx: How to ignore the number of whitespaces?
- References:
- RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- Re: RegEx: How to ignore the number of whitespaces?
- From: Chris Diver
- Re: RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- Re: RegEx: How to ignore the number of whitespaces?
- From: Kevin Spencer
- Re: RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- Re: RegEx: How to ignore the number of whitespaces?
- From: Kevin Spencer
- Re: RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- RegEx: How to ignore the number of whitespaces?
- Prev by Date: Re: Multipart Mime Message Parser in .NET BCL
- Next by Date: Re: .NET compression library supporting .jar
- Previous by thread: Re: RegEx: How to ignore the number of whitespaces?
- Next by thread: Re: RegEx: How to ignore the number of whitespaces?
- Index(es):
Relevant Pages
|