RE: Parse HTML DOM document in console application

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Rowland Shaw (RowlandShaw_at_discussions.microsoft.com)
Date: 10/05/04


Date: Tue, 5 Oct 2004 04:01:02 -0700

You may be interested in this article, albeit with examples in C#

http://www.vsj.co.uk/articles/display.asp?id=389

"John Williams" wrote:

> How do I load a HTML page (via URL) and parse the DOM in a Console
> Application?
>
> I've successfully done all this in a Windows Application by using the
> WebBrowser control, calling the Navigate method on the specified URL, and
> then, within the DocumentComplete event, parsing the HTML page using
> mshtml.HTMLDocument.
>
> I'm writing it as a console app because I don't need to display the HTML,
> just search for a specific tag and retrieve a href value from it.
>
> Thanks for any help on this.
>
>
>
>



Relevant Pages

  • Re: Web page served as application/xhtml+xml
    ... it appears each of these will display the page if the URL is in the ... You are just witnessing how IE incorrectly uses file extension rather ... don't go beyond the HTML subset... ... running on Windows 2000 Advanced Server, ...
    (comp.infosystems.www.authoring.html)
  • Re: window
    ... The files to be displayed (HTML and linked files like pictures) will ... The need is what I explained: display some contents in windows. ...
    (comp.lang.java.programmer)
  • WebBrowser page loading and HTML capture
    ... I am attempting to load a web page and display its HTML contents into a text ... box on a Windows Application Form using the WebBrowser control in Visual ...
    (microsoft.public.dotnet.languages.csharp)
  • Getting Source HTML of a Lotus Notes Display
    ... Anyone know of a strategy to extract the HTML source of a Lotus Notes ... Getting the windows' handle and then the text of the window provides a blank. ... >From that I presume that the display is a rendered HTML, so how can I get the ...
    (comp.lang.pascal.delphi.misc)
  • Re: can we break the wordwrap limit in kmail?
    ... "...people working on dumb serial terminals on UNIX machines..." ... HTML is a rendering language and its rendering depends ... that is - while Microsoft Windows considers the ... GNU/Linux users as well without Average Joe even knowing about it. ...
    (comp.os.linux.misc)