Re: RegEx: How to ignore the number of whitespaces?
- From: "Florian Haag" <florianhaag@xxxxxxxx>
- Date: Fri, 15 Jun 2007 10:33:37 -0700
Hi! First of all, thanks for your response!
Chris Diver wrote:
Florian Haag wrote:
I want to match strings while ignoring the number of whitespaces.
In a simple case, this would of course mean something like
a\s*b would complely ignore the whitespaces, unless you want at least
one.
Yes, I do need at least one whitespace. I don't want to ignore the
whitespaces alltogether, I just want to ignore the number of subsequent
whitespaces.
a\s+b
which would match not only "a b", but also "a b", "a b" etc.
However, a case like
a\s+b?\s+b
already doesn't work for me any more, as it would only match "a c"
(two spaces in between), not "a c" (one space in between), if the
"b" is omitted. I can override this by using an expression like
This expression will never match a c are you trying to match a b b
as well as a b ?
Oops, sorry - the last "b" should have been a "c", as in
a\s+b?\s+c
However, this won't match "a c" (with one space in between).
Things get even more complicated in cases like this:By that above the rules about the pattern of your strings are that it
(a|b\s+)(c|\s+d)
It seems to me that I cannot evaluate this directly but instead
have to replace it with
ac|a\s+d|b\s+c|b\s+d
can be either.
ac
a (at least one space) d
b (at least one space) c
b (at least one space) d
Yes, that's correct - my question is whether I can go another way than
resolving (a|b\s+)(c|\s+d) to ac|a\s+d|b\s+c|b\s+d (which would
obviously mean to create all possible combinations of the (...|...)
parts). If each bracket hold more than two alternatives, this would
mean an enourmous increase in the size of the RegEx, which I'd like to
avoid, if possible.
Sounds like your homework to me, I don't understand what the format
the strings they are supposed to match.
It's definitely not my homework; it's actually for a vocabulary
training programme the first version of which you can find here:
http://VocDB.de.vu
The input strings are supposed to have the following format:
a and b may be replaced with any characters (or chains thereof) except
\[]()|.
\ preceding either of \[]()| escapes the respective symbol, otherwise
it'll have a special meaning, as described below.
[a] means "a" is optional.
[a|b] means either "a" or "b" or nothing may be written.
(a|b) means either "a" or "b" must be written.
There can be more than one | within each pair of brackets, delimiting
more than two alternatives.
i.e. the whole thing is something slightly Regex-like for
non-programmers.
In version 1 of the above programme, I use my own evaluator for this.
However, for the sake of maintainability, I hoped I could eventually
switch to simply converting those input patterns into RegEx-strings.
If only there were a way to ignore the number of subsuquent whitespaces
without ignoring that there _are_ whitespaces at all at certain places
in the word.
Kind regards,
Florian
.
- Follow-Ups:
- Re: RegEx: How to ignore the number of whitespaces?
- From: Kevin Spencer
- Re: RegEx: How to ignore the number of whitespaces?
- References:
- RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- Re: RegEx: How to ignore the number of whitespaces?
- From: Chris Diver
- RegEx: How to ignore the number of whitespaces?
- Prev by Date: Re: RegEx: How to ignore the number of whitespaces?
- Next by Date: Question about BeginInvoke.
- Previous by thread: Re: RegEx: How to ignore the number of whitespaces?
- Next by thread: Re: RegEx: How to ignore the number of whitespaces?
- Index(es):
Relevant Pages
|