Re: RegEx: How to ignore the number of whitespaces?
- From: "Kevin Spencer" <unclechutney@xxxxxxxxxxxx>
- Date: Mon, 18 Jun 2007 06:52:36 -0400
If you can explain the requirements of the pattern you're trying to match,
without using any regular expression terminology, I can help. A regular
expression is a sequence of characters that represent a pattern, or a set of
rules regarding what is to be matched in text. Since you're having trouble
creating the regular expression, using regular expression symbol terminology
to explain the rules only confuses the issue.
Here's an example of what I mean:
"I want to match any number (greater than 0) of sequences of 1 or more
alphanumeric (only) characters with no spaces between them. Each sequence is
separated from the others by a single space, which may be any white space
character except for a line break. Any non-alpha-numeric character other
than a non-line-break white space character terminates a matching sequence."
Note that no regular expression terminology is used in the above
description. It describes the rules for a matching character sequence,
including what is required, how many of what is required are required, what
is NOT required, and what is prohibited.
--
HTH,
Kevin Spencer
Microsoft MVP
Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
"Florian Haag" <florianhaag@xxxxxxxx> wrote in message
news:ec9IPN3rHHA.2376@xxxxxxxxxxxxxxxxxxxxxxx
Hi! First of all, thanks for your response!
Chris Diver wrote:
Florian Haag wrote:
I want to match strings while ignoring the number of whitespaces.
In a simple case, this would of course mean something like
a\s*b would complely ignore the whitespaces, unless you want at least
one.
Yes, I do need at least one whitespace. I don't want to ignore the
whitespaces alltogether, I just want to ignore the number of subsequent
whitespaces.
a\s+b
which would match not only "a b", but also "a b", "a b" etc.
However, a case like
a\s+b?\s+b
already doesn't work for me any more, as it would only match "a c"
(two spaces in between), not "a c" (one space in between), if the
"b" is omitted. I can override this by using an expression like
This expression will never match a c are you trying to match a b b
as well as a b ?
Oops, sorry - the last "b" should have been a "c", as in
a\s+b?\s+c
However, this won't match "a c" (with one space in between).
Things get even more complicated in cases like this:By that above the rules about the pattern of your strings are that it
(a|b\s+)(c|\s+d)
It seems to me that I cannot evaluate this directly but instead
have to replace it with
ac|a\s+d|b\s+c|b\s+d
can be either.
ac
a (at least one space) d
b (at least one space) c
b (at least one space) d
Yes, that's correct - my question is whether I can go another way than
resolving (a|b\s+)(c|\s+d) to ac|a\s+d|b\s+c|b\s+d (which would
obviously mean to create all possible combinations of the (...|...)
parts). If each bracket hold more than two alternatives, this would
mean an enourmous increase in the size of the RegEx, which I'd like to
avoid, if possible.
Sounds like your homework to me, I don't understand what the format
the strings they are supposed to match.
It's definitely not my homework; it's actually for a vocabulary
training programme the first version of which you can find here:
http://VocDB.de.vu
The input strings are supposed to have the following format:
a and b may be replaced with any characters (or chains thereof) except
\[]()|.
\ preceding either of \[]()| escapes the respective symbol, otherwise
it'll have a special meaning, as described below.
[a] means "a" is optional.
[a|b] means either "a" or "b" or nothing may be written.
(a|b) means either "a" or "b" must be written.
There can be more than one | within each pair of brackets, delimiting
more than two alternatives.
i.e. the whole thing is something slightly Regex-like for
non-programmers.
In version 1 of the above programme, I use my own evaluator for this.
However, for the sake of maintainability, I hoped I could eventually
switch to simply converting those input patterns into RegEx-strings.
If only there were a way to ignore the number of subsuquent whitespaces
without ignoring that there _are_ whitespaces at all at certain places
in the word.
Kind regards,
Florian
.
- Follow-Ups:
- Re: RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- Re: RegEx: How to ignore the number of whitespaces?
- References:
- RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- Re: RegEx: How to ignore the number of whitespaces?
- From: Chris Diver
- Re: RegEx: How to ignore the number of whitespaces?
- From: Florian Haag
- RegEx: How to ignore the number of whitespaces?
- Prev by Date: .NET Framework 3.0 and VS2005
- Next by Date: RE: Exchange 2003 API Manipulation with C#?
- Previous by thread: Re: RegEx: How to ignore the number of whitespaces?
- Next by thread: Re: RegEx: How to ignore the number of whitespaces?
- Index(es):
Relevant Pages
|