Screen Scraping a Password Protected Site



I'm trying to screen scrape a site that requires a password. If I
access the site's login page in my browser and view the source, I
see that it does not contain a viewstate.

When my program posts the login information, the response I get
is the same page as if I had logged in using my browser. In the
page it says "Welcome" followed by my name. The cookie collection
returned doesn't contain any cookies (response.cookies.count =
0).

When I access other pages, the login screen is returned instead
of the desired page.

Obviously, I need to somehow maintain the session in subsequent
calls, but how do I do that when there are no cookies and there
is no viewstate?

If I use Fiddler to see what happens when I access the site from
my browser, I can see that the first line for the site (where the
result is 200 and the host says "CONNECT") says "SessionID:
empty" under Session Inspector - Textview for the request. For
the response it says "SessionID: " then several bytes of data.
Subsequent 200/CONNECT lines have that same data for both the
request and the response. This must be what I need to maintain my
session. If anyone can help me figure out how to get this
information and use it, I'll be very grateful.

(I'm using VB in VS2003.)

Thanks.


--
Greg
----
http://www.spencerbooksellers.com
greg00 -at- spencersoft -dot- com
.



Relevant Pages

  • Re: [PHP] Multiple session
    ... it stops sending pings either the browser is closed or the net connection is ... and then delete the user session and try to log it out. ... database and when a user tries to login again just check if there is an old ...
    (php.general)
  • Re: Force Relogin. IIS6, ASP.NET app, IE6+ browser
    ... now it appears you are suggesting I either write a custom authentication ... cookies/tokens involved; IIS has no idea what a session is; IIS does ... not prompt with a login dialog. ... The problem you face is that a browser will automatically attempt ...
    (microsoft.public.inetserver.iis.security)
  • Re: php sessions problem - wrong logic maybe
    ... Also when I logoff (wchich destroys session and goes back to login screen) everything works fine. ... The problem starts when I close the browser without login off. ... switch{case 1: include "$LOGINDIR/menu1.php"; global $LOGINDIR; ...
    (comp.lang.php)
  • Re: only one X11 application
    ... this should be a browser, ... the user should fall back to login. ... the X session will end too. ... It will start the X server and then run ...
    (comp.unix.solaris)
  • session destroy problems
    ... Also when I logoff (wchich destroys session and goes back to login screen) everything works fine. ... The problem starts when I close the browser without login off. ... >> global $LOGINDIR; ...
    (alt.php)