RE: regex puzzle!
From: Andrew (Andrew_at_discussions.microsoft.com)
Date: 11/23/04
- Next message: Bonj: "Re: How good an encryption algorithm is this?"
- Previous message: clintonG: "VB to C# Conversion Advice..."
- In reply to: G. Stewart: "regex puzzle!"
- Next in thread: G. Stewart: "Re: regex puzzle!"
- Reply: G. Stewart: "Re: regex puzzle!"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 23 Nov 2004 13:47:03 -0800
Regex's don't count. You may need to look into some grammar tools to
accomplish this. Or write some custom code.
"G. Stewart" wrote:
> The objective is to extract the first n characters of text from an
> HTML block. I wish to preserve all HTML (links, formatting etc.), and
> at the same time, extend the size of the block to ensure that all
> closing tags are recovered.
>
> For example, simply extracting the first 400 characters of a HTML
> block may result in an <i> opening tag being including, but its
> closing tag being excluding. Or a link may get chopped halfway - [...
> blah blah <a href="ht] may be the last few characters of the recovered
> phrase.
>
> Ideally, if any html opening tag is included in the first n
> characters, then any number of extra characters should continue to be
> extracted from the source block until all paired closing tags are
> found.
>
> We can assume that the source block is well-formed HTML, and every
> opening tag has a closing tag (whether optional or not). Furthermore
> (if it makes any difference), we can assume that all tags are given in
> their simplest forms with no attributes (e.g. <p>, <ul>, <li>, <b>),
> except for anchor tags, which have the href attribute of course.
>
> Can anyone suggest a regular expression to do this?
>
- Next message: Bonj: "Re: How good an encryption algorithm is this?"
- Previous message: clintonG: "VB to C# Conversion Advice..."
- In reply to: G. Stewart: "regex puzzle!"
- Next in thread: G. Stewart: "Re: regex puzzle!"
- Reply: G. Stewart: "Re: regex puzzle!"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|