Re: Convert HTML to text

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Here's an option:
http://www.veign.com/vrc_codeview.asp?type=app&id=130

I have used the above in conjunction with Regular Expressions to parse out
all data from a webpage. As far as getting around embedded CSS that's
pretty easy: 1) Its will either be contained within a <STYLE> tag so you
could throw out the tag and everything between it or 2) It will be contained
within a Style attribute of a tag so you when parse out the tag you could
throw out this attribute or all attributes and only get the data between
tags.


--
Chris Hanscom - Microsoft MVP (VB)
Veign's Resource Center
http://www.veign.com/vrc_main.asp
Veign's Blog
http://www.veign.com/blog
--


"Tomasz Klim" <tklim-ms@xxxxxxxxxxxxxx> wrote in message
news:eELumRlTGHA.1576@xxxxxxxxxxxxxxxxxxxxxxx
HTML is Text. You rule out regular expressions but in your case that
may be the best, fastest, and easiest way to extract out all HTML tags
from a webpage.

Yes, I know, but when the page contains "advanced" content, like embedded
CSS, or even JS, regular expressions won't remove it. Furthermore, I want
to preserve all links from the original html code.

Also, you say without downloading a page. That's how the web works. You
are not viewing the contents of the page on the server but a local copy
that has been downloaded to your system from the server - no way around
it.

You misunderstood me. I tried to say, that I have a page already on my
local disk.

So?



U¿ytkownik "Veign" <NOSPAMinveign@xxxxxxxxx> napisa³ w wiadomo¶ci
news:elYTz1hTGHA.2276@xxxxxxxxxxxxxxxxxxxxxxx
HTML is Text. You rule out regular expressions but in your case that
may be the best, fastest, and easiest way to extract out all HTML tags
from a webpage.

Also, you say without downloading a page. That's how the web works. You
are not viewing the contents of the page on the server but a local copy
that has been downloaded to your system from the server - no way around
it.

--
Chris Hanscom - Microsoft MVP (VB)
Veign's Resource Center
http://www.veign.com/vrc_main.asp
Veign's Blog
http://www.veign.com/blog
--


"Tomasz Klim" <tklim-ms@xxxxxxxxxxxxxx> wrote in message
news:%23LKkMPgTGHA.3976@xxxxxxxxxxxxxxxxxxxxxxx
Hi

I'm looking for a converter from HTML to text. But not a simple one, or
even a regular expression, but one that can convert also complicated
pages, with embedded styles, dynamic content etc.

Does anybody know such tool?

I know, that Internet Explorer can load a web page, and dump it in both
html/text format - but I need something faster, and without need to
downloading the page - just converting... Maybe do you know, how to use
IE for it?

Thanks in advance!









.



Relevant Pages