Re: downloading a single file using multiple threads



"Peter Duniho" <NpOeStPeAdM@xxxxxxxxxxxxxxxx> wrote in message news:op.tpwtuily8jd0ej@xxxxxxxxxxxxxxxxxxxxxxx
On Wed, 28 Mar 2007 09:59:21 -0700, Willy Denoyette [MVP] <willy.denoyette@xxxxxxxxxx> wrote:

I know that HTTP 1.1 supports range requests, but this is not exactly my point.
The multi part requests in HTTP1.1 are meant to request (for very specic application purposes) a single part or multiple parts in a single request, but you can't (AFAIK) requests multiple parts in parallel from multiple client threads.

Well, I've never actually tried it myself. So I will refrain from claiming that I know firsthand that this generally works. However, the so-called "download managers" all claim to work with HTTP servers. I have no reason to doubt them, and based on what I know about HTTP (which is an extremely simple protocol) it seems likely.

I didn't try it either, so I won't cliam it doesn't work, We did some work at this level using DECNet/OSI connecting to shared nothing clusters running OpenVMS some years ago, we were beaten by the HW evolution each step we made, got frustrated and finally gave up to do this in software,.
I know that down-load managers claim to work over HTTP, but that doesn't mean they support multi-part parallel request handling over the same or multiple connections, I don't even know if the protocol allows you to issue new range request when you have a range requests pending (on the same logical connection).


There is no theoretical reason why it wouldn't work. There's nothing about HTTP that requires servers to restrict their communications to a given client to a single connection, and there's nothing about HTTP that stipulates that an HTTP server needs to coordinate communications on independent connections. If on one connection the client asks for the first megabyte and on a second connection the same client asks for the second megabyte, then if the server is capable of servicing both requests at the same time, there's no reason the client can't wind up receiving both the first and second megabytes in parallel.

Agreed, but what's the advantage in a simple client server scenario? With simple I mean a simple PC connected over a dedicated LAN to an HTTP server.


Jon has already outlined the scenarios in which doing so would be helpful. You are absolutely correct that in many cases, the HTTP server is already providing data as fast as it can, and introducing multiple connections to the same server will only slow things down.

Agreed.

Likewise, if the HTTP server does throttle each connection, but your Internet connection is so slow that it's not even as fast as the throttled connection, multiple connections to the same server will again slow things down.


Sure.

And of course if the HTTP server is configured to throttle the transfer for each connection, it is often also configured to disallow multiple connections from the same client IP address.


They better do ;-)

There are lots of situations in which a "download manager" won't help at all. It's one of the reasons I don't bother with them...their ability to improve things is greatly overstated IMHO.

But it is true that there are scenarios in which multiple connections retrieving the same file, either from the same HTTP server or from multiple mirror servers, can indeed improve throughput. These scenarios may not be very common, but for some people they occur often enough to make it worthwhile using software that takes advantage of them.

Note that there's nothing to stop a download manager from retrieving different parts of the same file from multiple servers as well. Assuming an identical file stored on various mirrors, it doesn't matter which mirror a given part of the file comes from.

That's true, but these will use dedicated protocols don't they? The clients also should have multiple NIC's installed connected over segmented LAN's and/or routers to take some speed advantage of the parallelism.

It just depends. One well-known example of a dedicated protocol to do this sort of thing is BitTorrent. And you're right in suggesting that when this technique is used, it's not always via HTTP. But as you can tell from the success of BitTorrent, you don't actually need multiple NICs installed to take advantage of the technique, nor any special hardware at all.

The goal is to saturate your inbound network connection. If a server is throttling i/o (or is otherwise restricted) to a level below your inbound network connection (something that is becoming more and more common as broadband connections get faster and faster) then having that server send data on multiple connections (assuming it's not smart enough to detect and prevent that condition), or having multiple servers each sending different portions of the same data to the same client, can help saturate that inbound network connection, minimizing the time it takes to download 100% of the data requested.


Agreed, however, I don't know if this discussion is of any help to the OP, let's see if he will come back.


Willy.


.



Relevant Pages

  • Re: Slow DNS requests?
    ... I think that the delay is in resolving the DNS requests. ... Guest machines connected via CAT5 to the BEFSR41 run fine. ... As soon as the host name is resolved, performance seems to improve dramatically for that connection. ... your DHCP server will tell your machine what name servers to use. ...
    (comp.os.linux.networking)
  • Re: downloading a single file using multiple threads
    ... The multi part requests in HTTP1.1 are meant to request a single part or multiple parts in a single request, but you can't requests multiple parts in parallel from multiple client threads. ... There's nothing about HTTP that requires servers to restrict their communications to a given client to a single connection, and there's nothing about HTTP that stipulates that an HTTP server needs to coordinate communications on independent connections. ...
    (microsoft.public.dotnet.languages.csharp)
  • HTTP DDoS attack on our servers
    ... which appears like a DDoS attack. ... The requests always look the same way: ... Even though the server which ran at 8000 could not ... handle HTTP requests at all and immediately closed the connection after the ...
    (Incidents)
  • Re: Confused about ASP, "sessions", and queuing of multiple requests
    ... cookies (include ASP session state cookie) with the first IE instance. ... and generating multiple HTTP requests from a single IE ... :> running IIS on a multiprocessor server. ...
    (microsoft.public.inetserver.asp.general)
  • Re: Confused about ASP, "sessions", and queuing of multiple requests
    ... cookies (include ASP session state cookie) with the first IE instance. ... and generating multiple HTTP requests from a single IE ... :> running IIS on a multiprocessor server. ...
    (microsoft.public.inetserver.iis)