Re: Regex doubt
From: Ben Lucas (ben_at_nospam.solien.nospam.com)
Date: 10/26/04
- Next message: Sriram Krishnan: "Re: Regex doubt"
- Previous message: Eric Twietmeyer: "Re: AsyncWaitHandle and EndInvoke"
- In reply to: Sriram Krishnan: "Regex doubt"
- Next in thread: Sriram Krishnan: "Re: Regex doubt"
- Reply: Sriram Krishnan: "Re: Regex doubt"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 26 Oct 2004 09:56:14 -0700
A good friend of mine recently posted an article on his blog regarding using
regular expressions to match HTML. His article can be found at:
http://haacked.com/archive/2004/10/25/1471.aspx
Hope this helps.
-- Ben Lucas Lead Developer Solien Technology, Inc. www.solien.com "Sriram Krishnan" <ksriram@NOSPAMgmx.net> wrote in message news:OXjLTu3uEHA.1616@TK2MSFTNGP10.phx.gbl... > I'm doing some search-engine related work and want to match the actual > content of a html page (i.e any character which is not between a < and a > >). I first wrote > > (?:\<.*?>) (?<content>.*?) <?:\<.*?>) > > which basically says match any text between a opening and a closing tag. > The problem with this is that you almost always have nested tags.This exp > is braindead as it chokes on nested tags. So this would match something > like/ (this would match the '<img/>' part). > > So I came up with > > (?![\<|>].*?>) > > But the problem with this negative-look ahead is that it doesnt advance > beyond the first negation - it just stops there.I have a feeling that > saying what I *dont* want is the way to go. > > I'm a bit of a newbie to RegEx - and I'm trying to write a RegEx which > says something like - 'match any text that doesnt match this expression'. > Or is there any way to do reursive regex matching - that is , within a > pattern, match the pattern itself?In that case, the first pattern could be > made to work as I could have a recursive call inside the (?<content>) > pattern which keeps going down until you dont have any more nested tags > > Thannks in advance > > -- > Sriram Krishnan > > http://www.dotnetjunkies.com/weblog/sriram > > >
- Next message: Sriram Krishnan: "Re: Regex doubt"
- Previous message: Eric Twietmeyer: "Re: AsyncWaitHandle and EndInvoke"
- In reply to: Sriram Krishnan: "Regex doubt"
- Next in thread: Sriram Krishnan: "Re: Regex doubt"
- Reply: Sriram Krishnan: "Re: Regex doubt"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|