crawling the net...

From: ask josephsen ("ask)
Date: 04/29/04


Date: Thu, 29 Apr 2004 11:19:33 +0200

Hi NG

I'm making a program to crawl the internet. It works by retrieving all links
in a page, downloading the page of each link and again retrieving all the
links. (If there is better ways I'd like to hear)

My problem is relative links (like "../../wohoo.asp"). What is the smartest
way to get the full url (http://www.xyz.com/wohoo.asp)? Do I have to parse
the relative link in relation to the url where the relative link was found
and then concatenate it? Does anyone know how other search-engines/ crawlers
walk the net?

Thanks :)

./ask



Relevant Pages

  • crawling the net...
    ... I'm making a program to crawl the internet. ... It works by retrieving all links ... downloading the page of each link and again retrieving all the ... the relative link in relation to the url where the relative link was found ...
    (comp.lang.cpp)
  • Re: crawling the net...
    ... > I'm making a program to crawl the internet. ... downloading the page of each link and again retrieving ... > have to parse the relative link in relation to the url where the ...
    (comp.lang.cpp)
  • Re: crawling the net...
    ... > I'm making a program to crawl the internet. ... downloading the page of each link and again retrieving ... > have to parse the relative link in relation to the url where the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: BLOB (sqlserver) and timeouts
    ... When downloading the image from a webpage (it's actually a movie stored in ... >> What's the better way for retrieving a very large BLOB field from a SQL ...
    (microsoft.public.dotnet.languages.csharp)