regex for replacing plain text within html string...
- From: "Tim_Mac" <mackey.tim@xxxxxxxxx>
- Date: 20 Jan 2006 10:32:19 -0800
hi,
i have a tricky problem and my regex expertise has reached its limit.
i have read other posts on this newsgroup that pull out the plain text
from a html string, but that won't work for me because i want to
preserve the html, and replace some of the plain text.
i basically want to show the user's search terms highlighted in the
page, like google does, but i want to do this server side (i have the
mechanics of intercepting the html sorted out, by overriding the
Page.Render method). i can use a simple regex pattern like (keyword)
and replace with <span class='highlight'>$1</span> but this causes
problems because the keyword may appear in markup tags or attribute
values, which the above example will also replace, screwing up the html
structure.
what i want to express is: match the keyword, where it is not contained
inside a html tag, i.e. between a < and > character
my most obvious attempt is too simplistic and doesn't work:
[^<]*(keyword)[^>]*
i did come up with another regex which i am almost embarassed to show
:)
it essentially matches the keyword inside the inner text of a html tag
set. but the problem is that it misses subsequent occurrences of the
keyword in the same match.
here is the pattern:
<(?<tag>\w+)([^>]*>[^<]*)(?<innerText>KeyWord)([^<]*</\k<tag>>)
and the replace: <$3$1<span class='highlight'>$4</span>$2
it actually works, but as i mentioned it does miss multiple occurrences
inside the same tag, and requires all the text to be within an open +
close html tag.
i would be really grateful if anyone had a suggestion
thanks
tim
.
- Prev by Date: Re: XmlSerialization base classes
- Next by Date: Re: .net VS 05 HELL!
- Previous by thread: Using an event handler to trigger a CGI?
- Next by thread: Re: regex for replacing plain text within html string...
- Index(es):
Relevant Pages
|