Re: HTML parsing

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Carmen Sei <fatwallet951@xxxxxxxxx> wrote in
news:gjvot31p124cmnnr6fae2lb1ujgikhp9og@xxxxxxx:

looks like HTMLTidy is a good solution

http://sourceforge.net/projects/tidy


You could, or you could use a 3rd party library to do that. I
recommend http://www.pcre.org/ , which does Perl Compatible Regular
Expressions, and is used by a lot of well-known open source
projects. If you use the C++ wrappers which are freely downloadable
from that site, doing matching and/or replacing is pretty easy.

Nathan Mates

I use this in my project - it works quite nice.

After using tidy to parse into XML, I then use xerces to parse the xml. (I
have these compiled statically into my executable)

Dave Connet
.



Relevant Pages

  • Re: How to parse Xml?
    ... expressions.. ... First of all that's not valid XML. ... you have many options to parse XML with. ...
    (comp.lang.php)
  • Re: Data source options
    ... >> It sounds like you have the technology, although not in an open source ... I've been watching the "maturation" of the XML space. ... > SQL Server for 'unstructured data', but that looks VERY INTERESTING ... ... Visage ISN'T free - but it is cheap as chips ...
    (comp.databases.pick)
  • Re: Sending existing XML document to a document-literal web servic
    ... So the WSDL defines a type ... I don't want to parse the existing document and load the productOrderType ... My understanding is that you can use XmlSerializer to read the XML document ...
    (microsoft.public.dotnet.framework.aspnet.webservices)
  • Re: can someone review my code
    ... A particular version of glade UI ... parsing some subset of XML, ... wrote were indeed a simplified parser for simplified XML, ... code is able to parse anything at all); you can't get it right by ...
    (comp.lang.lisp)
  • Re: XML editor using MFC
    ... add elements to the tree control. ... XML tree is represented by an HTREEITEM. ... except for the CDATA stuff which is a real pain to parse. ... Error recovery: simplest form, when you find an error, throw a CException-derived class, ...
    (microsoft.public.vc.mfc)