Re: regex puzzle!
From: G. Stewart (galenstewart_at_yahoo.com)
Date: 11/24/04
- Next message: Yonas Hagos: "Re: translate code into c# from vb.net"
- Previous message: landagen: "RE: Securing hashing algorithm"
- In reply to: Andrew: "RE: regex puzzle!"
- Next in thread: Niki Estner: "Re: regex puzzle!"
- Messages sorted by: [ date ] [ thread ]
Date: 23 Nov 2004 20:19:06 -0800
Hmmm ... Allright. I was hoping for something quick and efficient, but
it look like I might have to do things the hard way - extract the
n-character block, count opening tags, count closing tags, then
continue extracting characters from the source block until all opening
tags have paired closing tags.
Andrew <Andrew@discussions.microsoft.com> wrote in message news:<72F30C38-7F6A-4A53-B056-5AB8A8A36973@microsoft.com>...
> Regex's don't count. You may need to look into some grammar tools to
> accomplish this. Or write some custom code.
>
> "G. Stewart" wrote:
>
> > The objective is to extract the first n characters of text from an
> > HTML block. I wish to preserve all HTML (links, formatting etc.), and
> > at the same time, extend the size of the block to ensure that all
> > closing tags are recovered.
> >
> > For example, simply extracting the first 400 characters of a HTML
> > block may result in an <i> opening tag being including, but its
> > closing tag being excluding. Or a link may get chopped halfway - [...
> > blah blah <a href="ht] may be the last few characters of the recovered
> > phrase.
> >
> > Ideally, if any html opening tag is included in the first n
> > characters, then any number of extra characters should continue to be
> > extracted from the source block until all paired closing tags are
> > found.
> >
> > We can assume that the source block is well-formed HTML, and every
> > opening tag has a closing tag (whether optional or not). Furthermore
> > (if it makes any difference), we can assume that all tags are given in
> > their simplest forms with no attributes (e.g. <p>, <ul>, <li>, <b>),
> > except for anchor tags, which have the href attribute of course.
> >
> > Can anyone suggest a regular expression to do this?
> >
- Next message: Yonas Hagos: "Re: translate code into c# from vb.net"
- Previous message: landagen: "RE: Securing hashing algorithm"
- In reply to: Andrew: "RE: regex puzzle!"
- Next in thread: Niki Estner: "Re: regex puzzle!"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|