Re: RegEx: How to ignore the number of whitespaces?



Hi! First of all, thanks for your response!

Chris Diver wrote:
Florian Haag wrote:

I want to match strings while ignoring the number of whitespaces.
In a simple case, this would of course mean something like


a\s*b would complely ignore the whitespaces, unless you want at least
one.

Yes, I do need at least one whitespace. I don't want to ignore the
whitespaces alltogether, I just want to ignore the number of subsequent
whitespaces.

a\s+b

which would match not only "a b", but also "a b", "a b" etc.

However, a case like

a\s+b?\s+b

already doesn't work for me any more, as it would only match "a c"
(two spaces in between), not "a c" (one space in between), if the
"b" is omitted. I can override this by using an expression like

This expression will never match a c are you trying to match a b b
as well as a b ?

Oops, sorry - the last "b" should have been a "c", as in

a\s+b?\s+c

However, this won't match "a c" (with one space in between).

Things get even more complicated in cases like this:

(a|b\s+)(c|\s+d)

It seems to me that I cannot evaluate this directly but instead
have to replace it with

ac|a\s+d|b\s+c|b\s+d

By that above the rules about the pattern of your strings are that it
can be either.

ac
a (at least one space) d
b (at least one space) c
b (at least one space) d

Yes, that's correct - my question is whether I can go another way than
resolving (a|b\s+)(c|\s+d) to ac|a\s+d|b\s+c|b\s+d (which would
obviously mean to create all possible combinations of the (...|...)
parts). If each bracket hold more than two alternatives, this would
mean an enourmous increase in the size of the RegEx, which I'd like to
avoid, if possible.

Sounds like your homework to me, I don't understand what the format
the strings they are supposed to match.

It's definitely not my homework; it's actually for a vocabulary
training programme the first version of which you can find here:

http://VocDB.de.vu

The input strings are supposed to have the following format:

a and b may be replaced with any characters (or chains thereof) except
\[]()|.
\ preceding either of \[]()| escapes the respective symbol, otherwise
it'll have a special meaning, as described below.
[a] means "a" is optional.
[a|b] means either "a" or "b" or nothing may be written.
(a|b) means either "a" or "b" must be written.

There can be more than one | within each pair of brackets, delimiting
more than two alternatives.

i.e. the whole thing is something slightly Regex-like for
non-programmers.

In version 1 of the above programme, I use my own evaluator for this.
However, for the sake of maintainability, I hoped I could eventually
switch to simply converting those input patterns into RegEx-strings.
If only there were a way to ignore the number of subsuquent whitespaces
without ignoring that there _are_ whitespaces at all at certain places
in the word.

Kind regards,
Florian
.



Relevant Pages

  • Re: java.io.InputStream - read Strings separated by whitespaces
    ... > I want to read Strings from the java.io.InputStream ... > separated by whitespaces. ... > I already got the advice to use a java.io.StreamTokenizer. ... method [as your post implies], you can build on that: ...
    (comp.lang.java.help)
  • Re: Input string vs. char array
    ... > The above code inputs a string. ... > whitespaces as line delimiters (which unacceptably breaks up names, ... strings, it uses whitespace as the delimiter. ...
    (alt.comp.lang.learn.c-cpp)
  • java.io.InputStream - read Strings separated by whitespaces
    ... I want to read Strings from the java.io.InputStream System.in. ... java.io.BufferedReader.readLnreads from the Stream line-by-line i ... want to ready word-by-word, separated by whitespaces. ...
    (comp.lang.java.help)
  • Re: RegEx: How to ignore the number of whitespaces?
    ... Florian Haag wrote: ... following with .NET's RegEx class: ... I want to match strings while ignoring the number of whitespaces. ... By that above the rules about the pattern of your strings are that it can be either. ...
    (microsoft.public.dotnet.framework)