This question is a derivated from this other question: Automate picture downloads from website with authentication where I asked how to download a picture from an specific website that requires a login.
There are two websites from the same company, cgwallpapers.com and gamewallpapers.com, since with the help of the user who answered the other question I finally maneged how to automate the download of one of the websites, I'm not able to reproduce the same steps on gamewallpapers.com website.
Maybe I can be wrong with the things that I'm gonna say due to my inexperience on requests, so please if an helper/expert have time to I really suggest to verify that the parametters and other things that I'm gonna say are like I'm saying or are not, as I say, I can be wrong.
In cgwallpapers.com, I basically set the query like this to download a wallpaper:
http://www.cgmewallpapers.com/members/getwallpaper.php?id=100&res=1920x1080
But I found that in gamewallpapers.com I cannot use the same post data because it seems to be like this:
In cgwallpapers is easier because I Just can use an incremental for loop with the ids with the specific wallpaper resolution, but with gamewallpapers.com site I can't figure out how I can automate the wallpaper downloads, it seems to need a treatment totally different if I'm not wrong.
So, I don't know what to try or even how to do it.
After I logged into gamewallpapers.com, this is the way that I'm trying to download a wallpaper, of course this does not works because I'm not using the proper query, but this code worked for cgwallpaper.com site so i'll show if it can help for something:
NOTE: WallpaperInfo
is a non-relevant object that I use to return the downloaded wallpaper image stream, it is much code so I skipped it.
''' <summary>
''' Tries to download the specified wallpaper from GameWallpapers server.
''' </summary>
''' <param name="id">The wallpaper id.</param>
''' <param name="res">The wallpaper resolution.</param>
''' <param name="cookieCollection">The cookie collection.</param>
''' <returns>A <see cref="WallpaperInfo"/> instance containing the wallpaper info and the image stream.</returns>
Private Function GetWallpaperMethod(ByVal id As String,
ByVal res As String,
ByRef cookieCollection As CookieCollection) As WallpaperInfo
Dim request As HttpWebRequest
Dim url As String = String.Format("http://www.gamewallpapers.com/members/getwallpaper.php?id={0}&res={1}", id, res)
Dim contentDisposition As String
Dim webResponse As WebResponse = Nothing
Dim responseStream As Stream = Nothing
Dim imageStream As MemoryStream = Nothing
Dim wallInfo As WallpaperInfo = Nothing
Try
request = DirectCast(HttpWebRequest.Create(url), HttpWebRequest)
With request
.Method = "GET"
.Headers.Add("Accept-Language", "en-US,en;q=0.5")
.Headers.Add("Accept-Encoding", "gzip, deflate")
.Headers.Add("Keep-Alive", "300")
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.AllowAutoRedirect = False
.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
.KeepAlive = True
End With
If cookieCollection IsNot Nothing Then
' Pass cookie info so that we remain logged in.
request.CookieContainer = Me.SetCookieContainer(url, cookieCollection)
End If
webResponse = request.GetResponse
Using webResponse
contentDisposition = CType(webResponse, HttpWebResponse).Headers("Content-Disposition")
If Not String.IsNullOrEmpty(contentDisposition) Then ' There is an image to download.
Dim filename As String = contentDisposition.Substring(contentDisposition.IndexOf("=") + "=".Length).
TrimStart(" "c).TrimEnd({" "c, ";"c})
Try
imageStream = New MemoryStream
responseStream = webResponse.GetResponseStream
Using responseStream
Dim buffer(2047) As Byte
Dim read As Integer
Do
read = responseStream.Read(buffer, 0, buffer.Length)
imageStream.Write(buffer, 0, read)
Loop Until read = 0
responseStream.Close()
End Using
Catch ex As Exception
Throw
End Try
' This is the object that I'll return
' that I'm storing the url, the wallpaper id,
' the wallpaper resolution, the wallpaper filename
' and finally the downloaded MemoryStream (the wallpaper image stream)
wallInfo = New WallpaperInfo(url:=url,
id:=id,
resolution:=res,
filename:=filename,
imageStream:=imageStream)
End If ' String.IsNullOrEmpty(contentDisposition)
End Using ' webResponse
Catch ex As Exception
Throw
Finally
If webResponse IsNot Nothing Then
webResponse.Close()
End If
If responseStream IsNot Nothing Then
responseStream.Close()
End If
End Try
Return wallInfo
End Function
Private Function SetCookieContainer(ByVal url As String,
ByVal cookieCollection As CookieCollection) As CookieContainer
Dim cookieContainer As New CookieContainer
Dim refDate As Date
For Each oldCookie As Cookie In cookieCollection
If Not DateTime.TryParse(oldCookie.Value, refDate) Then
Dim newCookie As New Cookie
With newCookie
.Name = oldCookie.Name
.Value = oldCookie.Value
.Domain = New Uri(url).Host
.Secure = False
End With
cookieContainer.Add(newCookie)
End If
Next oldCookie
Return cookieContainer
End Function
Here is the full source that I'm trying to realize with an example usage of how I expected that it should work (a for loop incrementing the wallpapers ids to automate downloads ), it works perfect when CHANGING the base url name from gamewallpapers.com
to cgwallpapers.com
, because this source only works for cgwallpapers.com
but I'm just trying it with gamewallpapers.com
url:
Update:
As promised, I have come up with a "proper" solution to your question for gamewallpapers.com using the Telerik Testing Framework.
You must change the
sUsername
andsPassword
variables to your own username/password to successfully log into the site.Optional variables that you may want to change:
sResolutionString
: Defaults to 1920x1080 which is what you specified in your original question. Change this value to any of the accepted resolution values on the website. Just a warning that I am not not 100% sure if all images have the same resolutions so changing this value may cause some images to be skipped if they do not have an image in the desired resolution.sDownloadPath
: Currently set to the same folder as the application exe. Change this to the path where you want to download your images.sUserAgent
: Defaults to the user agent for Internet Explorer 11 for Windows 7. Since the Telerik Testing Framework controls a real browser (whatever IE version you have installed on your pc in this case), it uses the "real" user agent when sending requests. This variable user agent string is only used when downloading wallpapers usingHttpWebRequest
and the default is most likely unnecessary since the included code will capture the user agent used by Telerik and save it for later use.nMaxSkippedFilesInSuccession
: Set to 10 by default. When trying to download a wallpaper image, the app will check if the filename already exists in your download directory. If it exists then the file will not be downloaded and a skip counter will be incremented. If the skip counter reaches the value ofnMaxSkippedFilesInSuccession
then the app stops as it assumes you have downloaded the rest of the files in a previous session. Note: In theory this value could even be set to 1 or 2 as the filenames are very unique and therefore would never overlap. The problem is that thetoplist.php
page is sorted by date and if in the middle of you running this app they add x new images then when you go to the next page the images will be shifted by x. If x is greater thannMaxSkippedFilesInSuccession
then you will most likely find that the app will end prematurely as you will be trying to download a number of the same images over again because of the shift.nCurrentPageID
: Set to 0 by default. The list pagetoplist.php
accepts a query string argument calledStart
which tells the page which index to start from depending on your chosen search arguments. The list shows 24 images per page so thenCurrentPageID
variable must be divisible by 24 or else you may end up skipping images. Depending on time and circumstances you may not be able to download all images in one session. If this is the case you can remember whichnCurrentPageID
you left off on and update this variable accordingly to start on a different id next time (keep in mind that the images may get shifted as new wallpapers are added to the site since the list page is sorted by wallpaper date).To use the
Telerik Testing Framework
you only need to install the setup file and then include a reference toArtOfTest.WebAii.dll
.One quirk about using the testing framework (at least with internet explorer) is that it doesn't allow you to start the browser as a hidden process. I have talked to telerik support about this and they claim that it is not possible to do although other web scraping frameworks like Watin do support this feature (I personally still prefer Watin for this and other reasons but it is quite old now and not updated since 2011). Since it is nice to run web scraping tasks in the background without bothering you from using your computer, this example starts the browser minimized (which telerik does support) and then uses windows api calls to hide the browser process. This is a bit of a hack but it is useful and works well in my experience.
In my original answer I mentioned that you would most likely have to crawl the
toplist.php
page by clicking links and building the download url but I was able to get this to work without clicking into any pages other thantoplist.php
. This is only possible because the wallpaper filename (which is basically the id that you need to download with) is partially contained in the preview image. I also originally thought that thekeystr
query string parameter was some kind of id that "protected" the download but it is actually not required at all to get the wallpaper.One last thing to mention is that the
toplist.php
page can be sorted by rating or date. Rating is very volatile and subject to change at any moment as people vote for images so this is not a good sort method for this type of work. We use the date in this case because it works well for sorting and should always have the images in the same order as before but there is a small issue: It doesn't seem to allow you to sort in the reverse order. Therefore the newest images always appear at the top on the first page. This causes images to shift over in the list and will most likely cause you to re-test the same images over again when this happens. For cgwallpapers.com this is not a problem because new images will receive a new (higher) id value and we can just remember the last id that we left off on and test the next id in succession to see if there are new images. For gamewallpapers.com we always re-run from pageid 0 and keep going until we reach a certain number of skipped files to know when we have found the end of the images since last download.Here is the code. Let me know if you have questions:
Usually the tool WGET would do the trick and allow you to download all the files in a websites directory. Unfortunately I tried it and it didn't work, I am unsure if its because I am not a member of the site or if the images are stored in a database.
Looking at the Query String I believe they are not using numeric IDs on purpose (for security reasons - so people cant easily get a webdump of their site) as per the alphanumeric wallpaper name and also a keystr being required:
If Wget fails you will need to write a screenscraper to download the links on each page, eg:
You can easily paginate downloading of all pages by incrementing the &start query string parameter:
http://www.gamewallpapers.com/toplist.php?start=24&action=go&title=&maxage=0&latestnr=0&platform=&resolution=&cyberbabes=&membersonly2=&rating=0&minimumvotes2=0&sort=date
Once you have all the links for the images you can then download them using WebClient or HttpWebRequest.