Re: Screen Scraping a Password Protected Site
- From: "Blake" <blake.ackland@xxxxxxxxx>
- Date: 17 Dec 2006 03:28:32 -0800
Triple post. Yay!.
It's also worth noting that you dont need to use Fiddler to see the
http traffic.
The System.Net classes have been compiled with TRACE turned on, so you
can add a .config file like this;
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<system.diagnostics>
<sources>
<source name="System.Net" switchValue="Information"/>
</sources>
</system.diagnostics>
</configuration>
....and you will see the http headers going back and forth in the output
window. If you set the level to verbose you can also see the data.
-Blake
Blake wrote:
i Should have stated before that to login you will need to call
CookieWebClient.UploadValues() to post to your sites login form first.
-Blake
Blake wrote:
Check out the System.Net.CookieContainer class.
You can override a System.Net.WebClient class to store and retrieve
cookies to a singleton CookieContainer and the once you have logged in
to the website you will stay logged in.
something like this (untested) ...
'============================================
Imports System.Net
Public Class CookieWebClient : Inherits WebClient
' overridden to add cookie headers to http requests.
Protected Overrides Function GetWebRequest(ByVal address As
System.Uri) As System.Net.WebRequest
Dim request As WebRequest = MyBase.GetWebRequest(address)
If TypeOf request Is HttpWebRequest Then
DirectCast(request, HttpWebRequest).CookieContainer =
_cookies
End If
Return request
End Function
' overridden to save cookies to the container for http requests.
Protected Overrides Function GetWebResponse(ByVal request As
System.Net.WebRequest) As System.Net.WebResponse
Dim response As WebResponse = MyBase.GetWebResponse(request)
If TypeOf response Is HttpWebResponse Then
_cookies.Add(response.ResponseUri, DirectCast(response,
HttpWebResponse).Cookies)
End If
Return response
End Function
' overridden to save cookies to the container for async http
requests.
Protected Overrides Function GetWebResponse(ByVal request As
System.Net.WebRequest, ByVal result As System.IAsyncResult) As
System.Net.WebResponse
Dim response As WebResponse = MyBase.GetWebResponse(request,
result)
If TypeOf response Is HttpWebResponse Then
_cookies.Add(response.ResponseUri, DirectCast(response,
HttpWebResponse).Cookies)
End If
Return response
End Function
Private Shared _cookies As CookieContainer = New CookieContainer
End Class
'============================================
Then just use the ExWebClient class to make your requests;
Dim c As New ExWebClient
Dim s as string = c.DownloadString("http://www.somesite.com")
Works for me :-)
-Blake
Gregory A Greenman wrote:
I'm trying to screen scrape a site that requires a password. If I
access the site's login page in my browser and view the source, I
see that it does not contain a viewstate.
When my program posts the login information, the response I get
is the same page as if I had logged in using my browser. In the
page it says "Welcome" followed by my name. The cookie collection
returned doesn't contain any cookies (response.cookies.count =
0).
When I access other pages, the login screen is returned instead
of the desired page.
Obviously, I need to somehow maintain the session in subsequent
calls, but how do I do that when there are no cookies and there
is no viewstate?
If I use Fiddler to see what happens when I access the site from
my browser, I can see that the first line for the site (where the
result is 200 and the host says "CONNECT") says "SessionID:
empty" under Session Inspector - Textview for the request. For
the response it says "SessionID: " then several bytes of data.
Subsequent 200/CONNECT lines have that same data for both the
request and the response. This must be what I need to maintain my
session. If anyone can help me figure out how to get this
information and use it, I'll be very grateful.
(I'm using VB in VS2003.)
Thanks.
--
Greg
----
http://www.spencerbooksellers.com
greg00 -at- spencersoft -dot- com
.
- References:
- Screen Scraping a Password Protected Site
- From: Gregory A Greenman
- Re: Screen Scraping a Password Protected Site
- From: Blake
- Re: Screen Scraping a Password Protected Site
- From: Blake
- Screen Scraping a Password Protected Site
- Prev by Date: Re: My.Application.OpenForms
- Next by Date: can only open read-only Excel file from VB
- Previous by thread: Re: Screen Scraping a Password Protected Site
- Next by thread: Re: VS2005 SP1
- Index(es):
Relevant Pages
|