Using: Delphi 2010, latest version of Indy
I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.
Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.
However, my Delphi application is only generating a couple of requests before it stops.
Here are the steps I have followed:
- Drop a IdHTTP and a IdSSLIOHandlerSocketOpenSSL1 component on the form.
- Set the IdHTTP component properties AllowCookies and HandleRedirects to True, and IOHandler property to the IdSSLIOHandlerSocketOpenSSL1.
- Set the IdSSLIOHandlerSocketOpenSSL1 component property Method := 'sslvSSLv23'
Finally I have this code:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
IdHTTP1.Get(FURL, Output);
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.
This is the output of the headers from Fiddler:
HTTP/1.1 302 Found Location: https://encrypted.google.com/ Cache-Control: private Content-Type: text/html; charset=UTF-8 Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly Date: Tue, 28 Dec 2010 21:29:43 GMT Server: gws Content-Length: 226 X-XSS-Protection: 1; mode=block
Firstly, is there anything wrong with this output?
Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?