crawling the net...
From: ask josephsen ("ask)
Date: 04/29/04
- Next message: phoenix: "Re: newbie - can't use child namespaces without full qualification, eh?"
- Previous message: Mohamoss: "RE: Using old DLL"
- Next in thread: JKop: "Re: crawling the net..."
- Reply: JKop: "Re: crawling the net..."
- Reply: Morten Wennevik: "Re: crawling the net..."
- Reply: mortb: "Re: crawling the net..."
- Reply: Christopher Benson-Manica: "Re: crawling the net..."
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 29 Apr 2004 11:19:33 +0200
Hi NG
I'm making a program to crawl the internet. It works by retrieving all links
in a page, downloading the page of each link and again retrieving all the
links. (If there is better ways I'd like to hear)
My problem is relative links (like "../../wohoo.asp"). What is the smartest
way to get the full url (http://www.xyz.com/wohoo.asp)? Do I have to parse
the relative link in relation to the url where the relative link was found
and then concatenate it? Does anyone know how other search-engines/ crawlers
walk the net?
Thanks :)
./ask
- Next message: phoenix: "Re: newbie - can't use child namespaces without full qualification, eh?"
- Previous message: Mohamoss: "RE: Using old DLL"
- Next in thread: JKop: "Re: crawling the net..."
- Reply: JKop: "Re: crawling the net..."
- Reply: Morten Wennevik: "Re: crawling the net..."
- Reply: mortb: "Re: crawling the net..."
- Reply: Christopher Benson-Manica: "Re: crawling the net..."
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|