Re: Convert HTML to text
- From: "Veign" <NOSPAMinveign@xxxxxxxxx>
- Date: Thu, 23 Mar 2006 11:29:05 -0500
Here's an option:
http://www.veign.com/vrc_codeview.asp?type=app&id=130
I have used the above in conjunction with Regular Expressions to parse out
all data from a webpage. As far as getting around embedded CSS that's
pretty easy: 1) Its will either be contained within a <STYLE> tag so you
could throw out the tag and everything between it or 2) It will be contained
within a Style attribute of a tag so you when parse out the tag you could
throw out this attribute or all attributes and only get the data between
tags.
--
Chris Hanscom - Microsoft MVP (VB)
Veign's Resource Center
http://www.veign.com/vrc_main.asp
Veign's Blog
http://www.veign.com/blog
--
"Tomasz Klim" <tklim-ms@xxxxxxxxxxxxxx> wrote in message
news:eELumRlTGHA.1576@xxxxxxxxxxxxxxxxxxxxxxx
HTML is Text. You rule out regular expressions but in your case that
may be the best, fastest, and easiest way to extract out all HTML tags
from a webpage.
Yes, I know, but when the page contains "advanced" content, like embedded
CSS, or even JS, regular expressions won't remove it. Furthermore, I want
to preserve all links from the original html code.
Also, you say without downloading a page. That's how the web works. You
are not viewing the contents of the page on the server but a local copy
that has been downloaded to your system from the server - no way around
it.
You misunderstood me. I tried to say, that I have a page already on my
local disk.
So?
U¿ytkownik "Veign" <NOSPAMinveign@xxxxxxxxx> napisa³ w wiadomo¶ci
news:elYTz1hTGHA.2276@xxxxxxxxxxxxxxxxxxxxxxx
HTML is Text. You rule out regular expressions but in your case that
may be the best, fastest, and easiest way to extract out all HTML tags
from a webpage.
Also, you say without downloading a page. That's how the web works. You
are not viewing the contents of the page on the server but a local copy
that has been downloaded to your system from the server - no way around
it.
--
Chris Hanscom - Microsoft MVP (VB)
Veign's Resource Center
http://www.veign.com/vrc_main.asp
Veign's Blog
http://www.veign.com/blog
--
"Tomasz Klim" <tklim-ms@xxxxxxxxxxxxxx> wrote in message
news:%23LKkMPgTGHA.3976@xxxxxxxxxxxxxxxxxxxxxxx
Hi
I'm looking for a converter from HTML to text. But not a simple one, or
even a regular expression, but one that can convert also complicated
pages, with embedded styles, dynamic content etc.
Does anybody know such tool?
I know, that Internet Explorer can load a web page, and dump it in both
html/text format - but I need something faster, and without need to
downloading the page - just converting... Maybe do you know, how to use
IE for it?
Thanks in advance!
.
- References:
- Convert HTML to text
- From: Tomasz Klim
- Re: Convert HTML to text
- From: Veign
- Re: Convert HTML to text
- From: Tomasz Klim
- Convert HTML to text
- Prev by Date: Re: Will a P&D packaged DLL (DLLSelfRegister) be registered correctly
- Next by Date: Re: File Transfer Protocol (FTP)
- Previous by thread: Re: Convert HTML to text
- Next by thread: Adding treeview programacticlly
- Index(es):
Relevant Pages
|