Re: regular expression question



Hi Ludwig,

We're getting closer, but remember that close only counts in horseshoes and
hand-grenades, not in programming!

To use Regular Expressions, you must be *absolutely specific* about your
rules.

Well, I'm working on a editor control that supports syntax
highlighting. I have al list of words that should be highlighted when
typed in the editor, for example 'public', 'class', etc.

So at a given time, the user types in the word public, and when the
character 'c' is typed, the word 'public' is colored in blue, for
example.

Let me explain what is missing here. "Syntax" means nothing to Regular
Expressions, and very little to humans. That is, it can refer to so many
different things (such as the "syntax" I'm using to write this post) that it
identifies nothing in and of itself.

I have al list of words that should be highlighted when
typed in the editor

That is what you think you mean, but that is not what you mean. For example,
note the 2 uses of "public" in the following example:

public string Opened()
{
return "Open to the public";
}

Now, the first instance of "public" is syntax, but the second is part of a
string. In other words, "syntax" is a set of rules. From Dictionary.com,
"syntax" means:

"The rules governing the formation of statements in a programming language."

Obviously, "public" as part of a string is not syntax. How do you expect to
tell the Regular expression the difference? You must know the exact syntax
rules, and be able to express them in Regular Expression syntax.

At the moment, I use the pattern '\b\w' to identify the first

Of course, this is unsuitable. The "\b" expression indicates the beginning
or ending or a word, that is a set of characters that is composed entirely
of word characters, and as I said before, '<' is not a word character.

However, there are also xml tags that need to be highlighted; for
example, <sometagname> : if the user types in the '<', it should be
colored; if he then types the last 'e' of 'sometagname', the word
'sometagname' should be colored, if he then types '>', that too should
be colored.

Okay, now you've introduced the topic of XML, which was not part of the
topic in your earlier message, nor up until this point in your current post.
Yet, you have not stated what you mean by "syntax highlighting," nor what
this "syntax" is for. I could assume that you mean "XML syntax" but you have
not said so, so I cannot logically make that assumption. The string you're
parsing may only *contain* XML, as well as other "syntax."

So in fact, each word or character that I define in the list of words,
should be colored.

Not necessarily. See my example (about "public") above. You need to be
*absolutely specific*.

- spaces always define the beginning and end of a word:

Are you certain of this? What about line breaks? Might any of these "words"
be at the beginning or end of the string? If so, they will either not be
preceded by a space nor followed by one.

<?xml version="1.0" encoding="utf-8" ?> -> I need to identify the <,
?, xml, version, encoding in order to highlight these in various
colors.

Okay, see, now you want to identify the '?' in an XML tag. But that is not a
word character, nor is it delimited from "xml" by a space. Again, the syntax
of the Regular Expression depends upon an *absolutely specific* description
of the rules for matching and grouping.

I hope that you understand what I'm trying to do here...

Not yet, but I hope to!

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Show me your certification without works,
and I'll show my certification
*by* my works.


.



Relevant Pages

  • Re: new for regular expression in Perl
    ... I want to use regular expression to find the exact ... match in the string. ... match then because the 's' preceding 'c' is a word character and so the \b ... preceding the 'c' would make \b true. ...
    (perl.beginners)
  • Re: regular expression question
    ... "Syntax" means nothing to Regular ... and be able to express them in Regular Expression syntax. ... in an XML tag. ... nor is it delimited from "xml" by a space. ...
    (microsoft.public.dotnet.general)
  • Re: String function in VB
    ... not a member function of the string class. ... "regular expression" like syntax for the filter. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: regular expression for perl, tcl, sed, grep, awk
    ... > I wonder is the regular expression the same for perl, tcl, sed, grep ... > and awk except for the syntax? ... which sorts of RE are supported by your sed, grep and awk. ...
    (comp.lang.perl.misc)
  • Re: String#to_rx ?
    ... > slowing regular expression matching. ... down matching is the one that executes arbitrary Perl statments, ... > changes to regexp syntax. ... I didn't even mention Perl in my original posting. ...
    (comp.lang.ruby)