Re: RegEx: How to ignore the number of whitespaces?



If you can explain the requirements of the pattern you're trying to match,
without using any regular expression terminology, I can help. A regular
expression is a sequence of characters that represent a pattern, or a set of
rules regarding what is to be matched in text. Since you're having trouble
creating the regular expression, using regular expression symbol terminology
to explain the rules only confuses the issue.

Here's an example of what I mean:

"I want to match any number (greater than 0) of sequences of 1 or more
alphanumeric (only) characters with no spaces between them. Each sequence is
separated from the others by a single space, which may be any white space
character except for a line break. Any non-alpha-numeric character other
than a non-line-break white space character terminates a matching sequence."

Note that no regular expression terminology is used in the above
description. It describes the rules for a matching character sequence,
including what is required, how many of what is required are required, what
is NOT required, and what is prohibited.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"Florian Haag" <florianhaag@xxxxxxxx> wrote in message
news:ec9IPN3rHHA.2376@xxxxxxxxxxxxxxxxxxxxxxx
Hi! First of all, thanks for your response!

Chris Diver wrote:
Florian Haag wrote:

I want to match strings while ignoring the number of whitespaces.
In a simple case, this would of course mean something like


a\s*b would complely ignore the whitespaces, unless you want at least
one.

Yes, I do need at least one whitespace. I don't want to ignore the
whitespaces alltogether, I just want to ignore the number of subsequent
whitespaces.

a\s+b

which would match not only "a b", but also "a b", "a b" etc.

However, a case like

a\s+b?\s+b

already doesn't work for me any more, as it would only match "a c"
(two spaces in between), not "a c" (one space in between), if the
"b" is omitted. I can override this by using an expression like

This expression will never match a c are you trying to match a b b
as well as a b ?

Oops, sorry - the last "b" should have been a "c", as in

a\s+b?\s+c

However, this won't match "a c" (with one space in between).

Things get even more complicated in cases like this:

(a|b\s+)(c|\s+d)

It seems to me that I cannot evaluate this directly but instead
have to replace it with

ac|a\s+d|b\s+c|b\s+d

By that above the rules about the pattern of your strings are that it
can be either.

ac
a (at least one space) d
b (at least one space) c
b (at least one space) d

Yes, that's correct - my question is whether I can go another way than
resolving (a|b\s+)(c|\s+d) to ac|a\s+d|b\s+c|b\s+d (which would
obviously mean to create all possible combinations of the (...|...)
parts). If each bracket hold more than two alternatives, this would
mean an enourmous increase in the size of the RegEx, which I'd like to
avoid, if possible.

Sounds like your homework to me, I don't understand what the format
the strings they are supposed to match.

It's definitely not my homework; it's actually for a vocabulary
training programme the first version of which you can find here:

http://VocDB.de.vu

The input strings are supposed to have the following format:

a and b may be replaced with any characters (or chains thereof) except
\[]()|.
\ preceding either of \[]()| escapes the respective symbol, otherwise
it'll have a special meaning, as described below.
[a] means "a" is optional.
[a|b] means either "a" or "b" or nothing may be written.
(a|b) means either "a" or "b" must be written.

There can be more than one | within each pair of brackets, delimiting
more than two alternatives.

i.e. the whole thing is something slightly Regex-like for
non-programmers.

In version 1 of the above programme, I use my own evaluator for this.
However, for the sake of maintainability, I hoped I could eventually
switch to simply converting those input patterns into RegEx-strings.
If only there were a way to ignore the number of subsuquent whitespaces
without ignoring that there _are_ whitespaces at all at certain places
in the word.

Kind regards,
Florian


.



Relevant Pages

  • Re: Extract domain names out of URLs
    ... Match the regular expression below and capture its match into backreference ... Between zero and one times, as many times as possible, giving back as needed ... A character in the range between ?A? ...
    (microsoft.public.excel)
  • Can anyone write this recursion for simple regexp more beautifully and clearly than the braggarts
    ... I know that lisp eval is written more clear than this recursion below ... The Practice of Programming ... The problem was that any existing regular expression package was far ... c Matches any literal character c. ...
    (comp.lang.c.moderated)
  • Re: RegEx: How to ignore the number of whitespaces?
    ... a "simpler" regular expression syntax is likely to bite you eventually, ... but that some of these character sequences may be "marked" as ... This is a regular expression "if" conditional statement, ... do not understand why the pattern "personal computer" will only match ...
    (microsoft.public.dotnet.framework)
  • Re: logcheck.violations.ignore --does not work
    ... Peter T. Breuer wrote: ... it would not take care of it. ... Just use a correct regular expression. ... the period character match any single ...
    (comp.os.linux.security)
  • Re: Regular Expression Help
    ... I then allow for validation routines for the given controls. ... > Let me know if you know what the regular expression would be to limit X ... >>> character it should fail. ...
    (microsoft.public.dotnet.framework.aspnet)