Re: Regular Expression Matches
- From: "Pete Davis" <pdavis68@[nospam]hotmail.com>
- Date: Tue, 21 Feb 2006 13:15:58 -0600
Well, there are 3 groups. <caturl> is a group name. The other 2 are unnamed.
Why do they need to be named if <caturl> is named? I'm not interested in the
other groups. I'm simply using them as "delimiters" for lack of a better
word.
I've modified the expression to look like this:
(href=)(?<caturl>.*)(class=title.*\[ )
This gives the exact same results.
The escaping stuff gets a little confusing because the regular expressions
are actually stored in an XML file, so they get escaped for that.
In the XML file that looks like:
(href=)(?<caturl>.*)(class=title.*\[&nbsp;&nbsp;&nbsp;)
This still isn't returning multiple results. Just the last match. I don't
think the < was the problem.
Pete
"Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:%23$t0JWxNGHA.1032@xxxxxxxxxxxxxxxxxxxxxxx
Hi Pete,
You need to escape the '<' and '>' characters in your Regurlar Expression.
These are used in some flavors of Regular Expression language to indicate
a named group. If the first (<caturl>) is a group name, name both groups
or neither.
--
HTH,
Kevin Spencer
Microsoft MVP
.Net Developer
To a tea you esteem
a hurting back as a wallet.
"Pete Davis" <pdavis68@[nospam]hotmail.com> wrote in message
news:1eSdncq4reeQvWbenZ2dnUVZ_tWdnZ2d@xxxxxxxxxxxxxxx
I'm using regular expressions to extract some data and some links from
some web pages. I download the page and then I want to get a list of
certain links.
For building regular expressions, I use an app call The Regulator, which
makes it pretty easy to build and test regular expressions.
As a warning, I'm real weak with regular expressions. Let's say my
regular expression is:
(href=)(?<caturl>.*)(class=title>\[ )
Now, using The Regulator and giving it the source for a particular web
page, I get 8 matches.
According to the regulator, the options it's using are:
Multiline, ignore case, ignore whitespace
In my own code, I'm doing:
Regex indexRegex = new Regex(categoryListRegex,
RegexOptions.Multiline |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);
MatchCollection indexMatches = indexRegex.Matches(pageText);
This only returns one match in indexMatches with the same page that I'm
giving The Regulator. It seems that no matter what combination of regex
options I use, I'm only getting one match.
Why is that? How do I get all 8 matches?
Thanks.
pete
.
- References:
- Regular Expression Matches
- From: Pete Davis
- Re: Regular Expression Matches
- From: Kevin Spencer
- Regular Expression Matches
- Prev by Date: Programatically getting my AssemblyVersion
- Next by Date: Re: Programatically getting my AssemblyVersion
- Previous by thread: Re: Regular Expression Matches
- Next by thread: Re: Regular Expression Matches
- Index(es):
Relevant Pages
|