Re: regular expression question
- From: Ludwig <none@xxxxxxxx>
- Date: Sat, 25 Mar 2006 16:38:42 +0100
On Sat, 25 Mar 2006 09:46:08 -0500, "Kevin Spencer"
<kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi Ludwig,
It is not possible to answer your question as you've stated it. Here's why:
i'm using the regular expression \b\w to find the beginning of a word,
in my C# application. If the word is 'public', for example, it works.
However, if the word is '<public', it does not work: it seems that <
is not a valid character, so the beginning of the word starts at
theletter 'p' instead of '<'.
You have not defined your terms. You use the word "word," but you have not
defined what that is supposed to mean in your situation. In regular
expressions, there are no words, only characters. The "\w" character class
indicates a word *character*. A word character is defined in regular
expressions as a character that is either a digit or a letter of the
alphabet.
So, the character '<' is not defined in regular expressions as a word
character, and therefore is not identified as belonging to the set defined
by your rule.
However, while you have stated that you *do* want to identify the character
'<' as the "beginning of a word," you have not stated exactly what the rule
is, only a small part of it. For example, by what you've told me, the
following character sequences could all be "words" -
Hello Ludwig ('H', 'L') The first letters of each word are identified.
Hello, <Ludwig> ('H', '<') The first letter of "Hello" and the beginning '<'
are identified.
Hello, !!!!!!! ('H', '!') The first letter of "Hello" and the beginning '!'
are identified. This is possible because you have not stated what characters
you do *not* consider to be the beginnings of words.
And so on. In other words, a regular expression is shorthand for a rule that
defines a pattern. You need to explicitly define what the rule is in order
for me to create a regular expression that satisfies that rule.
Thanks for the explaination, Kevin!
Well, I'm working on a editor control that supports syntax
highlighting. I have al list of words that should be highlighted when
typed in the editor, for example 'public', 'class', etc.
So at a given time, the user types in the word public, and when the
character 'c' is typed, the word 'public' is colored in blue, for
example.
At the moment, I use the pattern '\b\w' to identify the first
character of the 'word' in the editor, and I use '\w\b' to identify
the last character of a word. This works.
However, there are also xml tags that need to be highlighted; for
example, <sometagname> : if the user types in the '<', it should be
colored; if he then types the last 'e' of 'sometagname', the word
'sometagname' should be colored, if he then types '>', that too should
be colored.
So in fact, each word or character that I define in the list of words,
should be colored.
This list of words can be (for example): public, class, int, long,
byte, byte[], <, >, sometagname, generic<>, etc....
So, if I try to define the rule:
- spaces always define the beginning and end of a word:
public class Test() -> I need to identify the public, class, Test()
- there are characters that are not seperated by spaces but that also
have to be found when typed:
<?xml version="1.0" encoding="utf-8" ?> -> I need to identify the <,
?, xml, version, encoding in order to highlight these in various
colors.
I hope that you understand what I'm trying to do here...
Kind regards,
Ludwig
.
- Follow-Ups:
- Re: regular expression question
- From: Kevin Spencer
- Re: regular expression question
- References:
- regular expression question
- From: Ludwig
- Re: regular expression question
- From: Kevin Spencer
- regular expression question
- Prev by Date: Re: Methodology
- Next by Date: Re: Need advice about using ASP.NET Profile Provider
- Previous by thread: Re: regular expression question
- Next by thread: Re: regular expression question
- Index(es):
Relevant Pages
|