Re: regex puzzle!

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: G. Stewart (galenstewart_at_yahoo.com)
Date: 11/24/04


Date: 23 Nov 2004 20:19:06 -0800

Hmmm ... Allright. I was hoping for something quick and efficient, but
it look like I might have to do things the hard way - extract the
n-character block, count opening tags, count closing tags, then
continue extracting characters from the source block until all opening
tags have paired closing tags.

Andrew <Andrew@discussions.microsoft.com> wrote in message news:<72F30C38-7F6A-4A53-B056-5AB8A8A36973@microsoft.com>...
> Regex's don't count. You may need to look into some grammar tools to
> accomplish this. Or write some custom code.
>
> "G. Stewart" wrote:
>
> > The objective is to extract the first n characters of text from an
> > HTML block. I wish to preserve all HTML (links, formatting etc.), and
> > at the same time, extend the size of the block to ensure that all
> > closing tags are recovered.
> >
> > For example, simply extracting the first 400 characters of a HTML
> > block may result in an <i> opening tag being including, but its
> > closing tag being excluding. Or a link may get chopped halfway - [...
> > blah blah <a href="ht] may be the last few characters of the recovered
> > phrase.
> >
> > Ideally, if any html opening tag is included in the first n
> > characters, then any number of extra characters should continue to be
> > extracted from the source block until all paired closing tags are
> > found.
> >
> > We can assume that the source block is well-formed HTML, and every
> > opening tag has a closing tag (whether optional or not). Furthermore
> > (if it makes any difference), we can assume that all tags are given in
> > their simplest forms with no attributes (e.g. <p>, <ul>, <li>, <b>),
> > except for anchor tags, which have the href attribute of course.
> >
> > Can anyone suggest a regular expression to do this?
> >



Relevant Pages

  • Re: regex puzzle!
    ... it look like I might have to do things the hard way - extract the ... n-character block, count opening tags, count closing tags, then ... continue extracting characters from the source block until all opening ...
    (microsoft.public.dotnet.framework)
  • Re: regex puzzle!
    ... it look like I might have to do things the hard way - extract the ... n-character block, count opening tags, count closing tags, then ... continue extracting characters from the source block until all opening ...
    (microsoft.public.dotnet.general)