My intention is to automate the downloading of all pictures in a website that requires a login (a web-form based login I think)
The website: http://www.cgwallpapers.com
The login url: http://www.cgwallpapers.com/login.php
The registered members url: http://www.cgwallpapers.com/members
A random wallpaper url that is only accesible and downloadable for registered members: http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080
Knowing that the viewwallpaper.php post data takes two parameters, the wallpaper id (from x to y) and the wallpaper res, I would like to write a FOR to generate all the combinations to automate the wallpaper downloads.
The first thing that I tried is just use a WebClient in this way:
Dim client As New WebClient()
client.Credentials = New System.Net.NetworkCredential("user", "pass")
client.DownloadFile("http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080", "C:\file.jpg")
But that didn't worked, it returns the html text contents instead of an image, I think it is because as I've read I need to pass the login cookie.
So, I've seen and researched many examples over StackOverflow and other sites about how to login and download a file through HttpWebRequests
because seems the proper way to do it.
This is the way how I login to the website and I get the proper login cookie (or I think so)
Dim logincookie As CookieContainer
Dim url As String = "http://www.cgwallpapers.com/login.php"
Dim postData As String = "action=go&emailMyUsername=&wachtwoord=MyPassword"
Dim tempCookies As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.Referer = "http://www.cgwallpapers.com/login.php"
.KeepAlive = True
postReq.CookieContainer = tempCookies
postReq.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
tempCookies.Add(postresponse.Cookies)
logincookie = tempCookies
postresponse.Close()
postreqstream.Close()
At this point I'm stuck because I'm not sure about how to use the obtained login cookie to download the pictures.
I suppose that after get the login cookie I just should perform another request to the desired wallpaper url using the saved login cookie, not?, but I think I'm doing it wrong, the next code does not works, postresponse.ContentLength
is always -1 so I can't write to file.
Dim url As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?"
Dim postData As String = "id=1764&res=1920x1080"
Dim byteData As Byte() = Encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.KeepAlive = True
' .Referer = ""
.CookieContainer = logincookie
.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
Dim memStream As MemoryStream
Using rdr As Stream = postresponse.GetResponseStream
Dim count As Integer = Convert.ToInt32(postresponse.ContentLength)
Dim buffer As Byte() = New Byte(count) {}
Dim bytesRead As Integer
Do
bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
Loop Until bytesRead = count
rdr.Close()
memStream = New MemoryStream(buffer)
End Using
File.WriteAllBytes("c:\wallpaper.jpg", memStream.ToArray)
How I can fix the issues to download the wallpaper(s) in the proper way?
Here is a complete solution to your question exclusively using
HttpWebRequest
andHttpWebResponse
requests to simulate browser requests. I have commented much of the code as to hopefully give you an idea of how this all works.You must change the
sUsername
andsPassword
variables to your own username/password to successfully log into the site.Optional variables that you may want to change:
sDownloadPath
: Currently set to the same folder as the application exe. Change this to the path where you want to download your images.sImageResolution
: Defaults to1920x1080
which is what you specified in your original question. Change this value to any of the accepted resolution values on the website. Just a warning that I am not not 100% sure if all images have the same resolutions so changing this value may cause some images to be skipped if they do not have an image in the desired resolution.nMaxErrorsInSuccession
: Set to10
by default. Once logged in, the app will continually increment the image id and attempt to download a new image. Some ids do not contain an image and this is normal as the image may have been deleted on the server (or maybe the image is not available in the desired resolution). If the app fails to download an imagenMaxErrorsInSuccession
times in a row then the application will stop as we assume we have reached the last of the images. It is possible that you may have to increase this to a higher number in the event that there are more than 10 images that are deleted or not available in the selected resolution.nCurrentID
: Set to1
by default. This is the image id used by the website to determine which image to serve to the client. As images are downloaded, thenCurrentID
variable is incremented by one each image download attempt. Depending on time and circumstances you may not be able to download all images in one session. If this is the case you can remember which ID you left off on and update this variable accordingly to start on a different id next time. Also useful for when you have successfully downloaded all images and want to run the app later to download newer images.sUserAgent
: Can be any user agent that you want. Currently using Firefox 35.0 for Windows 7. Note that some websites will function differently depending on what user agent you specify so only change this if you really need to emulate another browser.NOTE: There is a 3 second pause strategically inserted at various points in the code. Some websites have hammer scripts that will block or even ban users who are browsing a site too quickly. Although removing these lines will speed up the time it takes to download all images, I would not recommend doing so.
I threw this together I think its the right direction.
Try