Re: Searching web sub-pages
- From: "mayayana" <mayaXXyana1a@xxxxxxxxxxxxxxxx>
- Date: Thu, 30 Aug 2007 22:07:12 -0400
You could probably just parse it, but it might
be easier to use the DOM (after loading the page
into a WebBrowser control or IE instance). If you get
the document all collection, you can walk it by index,
checking...
If all(i).tagname = "A" Then
to filter links. An A tag has an "href" property that gives
you the link URL.
Then you just download each page, or navigate
to it with a Web Browser control. I assume you
weren't thinking of using DOM to look for the keywords.
You could make it a little bit easier by getting the
document.body.innerText before searching for
your keywords, but assuming that you're not searching
for words like "TABLE" or "DIV", there's probably not
much point. If it were me, I'd just download the 100
pages as files, so as to avoid the bloat of loading them
in IE and also avoid the security risks of loading 100
pages into IE without knowing for sure that they're
safe. Then I would just parse them with InStr.
Okay, here's the scenario:doesn't
I've got a web page. That web page has many, MANY links (hundreds, I
think...I haven't counted). I want to search those pages for specific
keywords.
Having defined the requirement, I have NO clue where to start. This
HAVE to be done programmatically, necessarily, so I could use some kind ofthought
web-crawling software if there's something free & easy to use, but I
it might not be a bad idea to try it programmatically, just to get my feetto
wet. I'm sort of assuming I'd want to use DOM, but I am utterly clueless
ANYTHING about it at all.This
Am I generally on the right track here, or what would people suggest?
is entirely a personal project, so I'm open to any solution of any kind.
Rob
.
- References:
- Searching web sub-pages
- From: Robert Morley
- Searching web sub-pages
- Prev by Date: Re: timer
- Next by Date: Re: Value of Copyright
- Previous by thread: Searching web sub-pages
- Index(es):
Relevant Pages
|
Loading