Automate picture downloads from website with authe

2019-02-26 20:46发布

问题:

My intention is to automate the downloading of all pictures in a website that requires a login (a web-form based login I think)

The website: http://www.cgwallpapers.com

The login url: http://www.cgwallpapers.com/login.php

The registered members url: http://www.cgwallpapers.com/members

A random wallpaper url that is only accesible and downloadable for registered members: http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080

Knowing that the viewwallpaper.php post data takes two parameters, the wallpaper id (from x to y) and the wallpaper res, I would like to write a FOR to generate all the combinations to automate the wallpaper downloads.

The first thing that I tried is just use a WebClient in this way:

Dim client As New WebClient()
client.Credentials = New System.Net.NetworkCredential("user", "pass")
client.DownloadFile("http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080", "C:\file.jpg")

But that didn't worked, it returns the html text contents instead of an image, I think it is because as I've read I need to pass the login cookie.

So, I've seen and researched many examples over StackOverflow and other sites about how to login and download a file through HttpWebRequests because seems the proper way to do it.

This is the way how I login to the website and I get the proper login cookie (or I think so)

Dim logincookie As CookieContainer

Dim url As String = "http://www.cgwallpapers.com/login.php"
Dim postData As String = "action=go&emailMyUsername=&wachtwoord=MyPassword"
Dim tempCookies As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)

Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
    .Method = "POST"
    .Host = "www.cgwallpapers.com"
    .Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    .Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
    .Headers.Add("Accept-Encoding: gzip, deflate")
    .ContentType = "application/x-www-form-urlencoded"
    .UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
    .Referer = "http://www.cgwallpapers.com/login.php"
    .KeepAlive = True

    postReq.CookieContainer = tempCookies
    postReq.ContentLength = byteData.Length
End With

Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
    .Write(byteData, 0, byteData.Length)
    .Close()
End With

Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)

tempCookies.Add(postresponse.Cookies)
logincookie = tempCookies

postresponse.Close()
postreqstream.Close()

At this point I'm stuck because I'm not sure about how to use the obtained login cookie to download the pictures.

I suppose that after get the login cookie I just should perform another request to the desired wallpaper url using the saved login cookie, not?, but I think I'm doing it wrong, the next code does not works, postresponse.ContentLength is always -1 so I can't write to file.

Dim url As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?"
Dim postData As String = "id=1764&res=1920x1080"

Dim byteData As Byte() = Encoding.GetBytes(postData)

Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
    .Method = "POST"
    .Host = "www.cgwallpapers.com"
    .Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    .Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
    .Headers.Add("Accept-Encoding: gzip, deflate")
    .ContentType = "application/x-www-form-urlencoded"
    .UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
    .KeepAlive = True
    ' .Referer = ""

    .CookieContainer = logincookie
    .ContentLength = byteData.Length
End With

Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
    .Write(byteData, 0, byteData.Length)
    .Close()
End With

Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)

Dim memStream As MemoryStream
Using rdr As Stream = postresponse.GetResponseStream
    Dim count As Integer = Convert.ToInt32(postresponse.ContentLength)
    Dim buffer As Byte() = New Byte(count) {}
    Dim bytesRead As Integer
    Do
        bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
    Loop Until bytesRead = count
    rdr.Close()
    memStream = New MemoryStream(buffer)
End Using

File.WriteAllBytes("c:\wallpaper.jpg", memStream.ToArray)

How I can fix the issues to download the wallpaper(s) in the proper way?

回答1:

Here is a complete solution to your question exclusively using HttpWebRequest and HttpWebResponse requests to simulate browser requests. I have commented much of the code as to hopefully give you an idea of how this all works.

You must change the sUsername and sPassword variables to your own username/password to successfully log into the site.

Optional variables that you may want to change:

  • sDownloadPath: Currently set to the same folder as the application exe. Change this to the path where you want to download your images.
  • sImageResolution: Defaults to 1920x1080 which is what you specified in your original question. Change this value to any of the accepted resolution values on the website. Just a warning that I am not not 100% sure if all images have the same resolutions so changing this value may cause some images to be skipped if they do not have an image in the desired resolution.
  • nMaxErrorsInSuccession: Set to 10 by default. Once logged in, the app will continually increment the image id and attempt to download a new image. Some ids do not contain an image and this is normal as the image may have been deleted on the server (or maybe the image is not available in the desired resolution). If the app fails to download an image nMaxErrorsInSuccession times in a row then the application will stop as we assume we have reached the last of the images. It is possible that you may have to increase this to a higher number in the event that there are more than 10 images that are deleted or not available in the selected resolution.
  • nCurrentID: Set to 1 by default. This is the image id used by the website to determine which image to serve to the client. As images are downloaded, the nCurrentID variable is incremented by one each image download attempt. Depending on time and circumstances you may not be able to download all images in one session. If this is the case you can remember which ID you left off on and update this variable accordingly to start on a different id next time. Also useful for when you have successfully downloaded all images and want to run the app later to download newer images.
  • sUserAgent: Can be any user agent that you want. Currently using Firefox 35.0 for Windows 7. Note that some websites will function differently depending on what user agent you specify so only change this if you really need to emulate another browser.

NOTE: There is a 3 second pause strategically inserted at various points in the code. Some websites have hammer scripts that will block or even ban users who are browsing a site too quickly. Although removing these lines will speed up the time it takes to download all images, I would not recommend doing so.

    Imports System.Net
    Imports System.IO

    Public Class Form2
        Const sUsername As String = "USERNAMEHERE"
        Const sPassword As String = "PASSWORDHERE"
        Const sImageResolution As String = "1920x1080"
        Const sUserAgent As String = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
        Const sMainURL As String = "http://www.cgwallpapers.com/"
        Const sCheckLoginURL As String = "http://www.cgwallpapers.com/login.php"
        Const sDownloadURLLeft As String = "http://www.cgwallpapers.com/members/getwallpaper.php?id="
        Const sDownloadURLRight As String = "&res="
        Private oCookieCollection As CookieCollection = Nothing
        Private nMaxErrorsInSuccession As Int32 = 10
        Private nCurrentID As Int32 = 1
        Private sDownloadPath As String = Application.StartupPath

        Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            StartScrape()
        End Sub

        Private Sub StartScrape()
            Try
                Dim bContinue As Boolean = True

                Dim sPostData(5) As String

                sPostData(0) = UrlEncode("action")
                sPostData(1) = UrlEncode("go")
                sPostData(2) = UrlEncode("email")
                sPostData(3) = UrlEncode(sUsername)
                sPostData(4) = UrlEncode("wachtwoord")
                sPostData(5) = UrlEncode(sPassword)

                If GetMethod(sMainURL) = True Then
                    If SetMethod(sCheckLoginURL, sPostData, sMainURL) = True Then
                        ' Login successful

                        Dim nErrorsInSuccession As Int32 = 0

                        Do Until nErrorsInSuccession > nMaxErrorsInSuccession
                            If DownloadImage(sDownloadURLLeft, sDownloadURLRight, sMainURL, nCurrentID) = True Then
                                ' Always reset error count when we successfully download
                                nErrorsInSuccession = 0
                            Else
                                ' Add one to error count because there was no image at the current id
                                nErrorsInSuccession += 1
                            End If

                            nCurrentID += 1
                            Threading.Thread.Sleep(3000)    ' Wait 3 seconds to prevent loading pages too quickly
                        Loop

                        MessageBox.Show("Finished downloading images")
                    End If
                Else
                    MessageBox.Show("Error connecting to main site. Are you connected to the internet?")
                End If
            Catch ex As Exception
                MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End Try
        End Sub

        Private Function GetMethod(ByVal sPage As String) As Boolean
            Dim req As HttpWebRequest
            Dim resp As HttpWebResponse
            Dim stw As StreamReader
            Dim bReturn As Boolean = True

            Try
                req = HttpWebRequest.Create(sPage)
                req.Method = "GET"
                req.AllowAutoRedirect = False
                req.UserAgent = sUserAgent
                req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
                req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
                req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
                req.Headers.Add("Keep-Alive", "300")
                req.KeepAlive = True

                resp = req.GetResponse        ' Get the response from the server 

                If req.HaveResponse Then
                    ' Save the cookie info

                    SaveCookies(resp.Headers("Set-Cookie"))

                    resp = req.GetResponse        ' Get the response from the server 
                    stw = New StreamReader(resp.GetResponseStream)
                    stw.ReadToEnd()    ' Read the response from the server, but we do not save it
                Else
                    MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End If
            Catch exc As WebException
                MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                bReturn = False
            End Try

            Return bReturn
        End Function

        Private Function SetMethod(ByVal sPage As String, ByVal sPostData() As String, sReferer As String) As Boolean
            Dim bReturn As Boolean = False
            Dim req As HttpWebRequest
            Dim resp As HttpWebResponse
            Dim str As StreamWriter
            Dim sPostDataValue As String = ""
            Dim nInitialCookieCount As Int32 = 0

            Try
                req = HttpWebRequest.Create(sPage)
                req.Method = "POST"
                req.UserAgent = sUserAgent
                req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
                req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
                req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
                req.Referer = sReferer
                req.ContentType = "application/x-www-form-urlencoded"
                req.Headers.Add("Keep-Alive", "300")

                If oCookieCollection IsNot Nothing Then
                    ' Pass cookie info from the login page
                    req.CookieContainer = SetCookieContainer(sPage)
                End If

                str = New StreamWriter(req.GetRequestStream)

                If sPostData.Count Mod 2 = 0 Then
                    ' There is an even number of post names and values

                    For i As Int32 = 0 To sPostData.Count - 1 Step 2
                        ' Put the post data together into one string
                        sPostDataValue &= sPostData(i) & "=" & sPostData(i + 1) & "&"
                    Next i

                    sPostDataValue = sPostDataValue.Substring(0, sPostDataValue.Length - 1) ' This will remove the extra "&" at the end that was added from the for loop above

                    ' Post the data to the server

                    str.Write(sPostDataValue)
                    str.Close()

                    ' Get the response

                    nInitialCookieCount = req.CookieContainer.Count
                    resp = req.GetResponse

                    If req.CookieContainer.Count > nInitialCookieCount Then
                        ' Login successful
                        ' Save new login cookies

                        SaveCookies(req.CookieContainer)
                        bReturn = True
                    Else
                        MessageBox.Show("The email or password you entered are incorrect." & vbCrLf & vbCrLf & "Please try again.", "Unable to log in", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
                        bReturn = False
                    End If
                Else
                    ' Did not specify the correct amount of parameters so we cannot continue
                    MessageBox.Show("POST error.  Did not supply the correct amount of post data for " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End If
            Catch ex As Exception
                MessageBox.Show("POST error.  " & ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                bReturn = False
            End Try

            Return bReturn
        End Function

        Private Function DownloadImage(ByVal sPageLeft As String, sPageRight As String, sReferer As String, nCurrentID As Int32) As Boolean
            Dim req As HttpWebRequest
            Dim bReturn As Boolean = False
            Dim sPage As String = sPageLeft & nCurrentID.ToString & sPageRight & sImageResolution

            Try
                req = HttpWebRequest.Create(sPage)
                req.Method = "GET"
                req.AllowAutoRedirect = False
                req.UserAgent = sUserAgent
                req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
                req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
                req.Headers.Add("Accept-Encoding", "gzip, deflate")
                req.Headers.Add("Keep-Alive", "300")
                req.KeepAlive = True

                If oCookieCollection IsNot Nothing Then
                    ' Pass cookie info so that we remain logged in
                    req.CookieContainer = SetCookieContainer(sPage)
                End If

                ' Save file to disk

                Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
                    Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")

                    If sContentDisposition IsNot Nothing Then
                        ' There is an image to download

                        Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim

                        Using responseStream As IO.Stream = oResponse.GetResponseStream
                            Using fs As New IO.FileStream(System.IO.Path.Combine(sDownloadPath, sFilename), FileMode.Create, FileAccess.Write)
                                Dim buffer(2047) As Byte
                                Dim read As Integer

                                Do
                                    read = responseStream.Read(buffer, 0, buffer.Length)
                                    fs.Write(buffer, 0, read)
                                Loop Until read = 0

                                responseStream.Close()
                                fs.Flush()
                                fs.Close()
                            End Using

                            responseStream.Close()
                        End Using

                        bReturn = True
                    End If

                    oResponse.Close()
                End Using
            Catch exc As WebException
                MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                bReturn = False
            End Try

            Return bReturn
        End Function

        Private Function SetCookieContainer(sPage As String) As System.Net.CookieContainer
            Dim oCookieContainerObject As New System.Net.CookieContainer
            Dim oCookie As System.Net.Cookie

            For c As Int32 = 0 To oCookieCollection.Count - 1
                If IsDate(oCookieCollection(c).Value) = False Then
                    oCookie = New System.Net.Cookie
                    oCookie.Name = oCookieCollection(c).Name
                    oCookie.Value = oCookieCollection(c).Value
                    oCookie.Domain = New Uri(sPage).Host
                    oCookie.Secure = False
                    oCookieContainerObject.Add(oCookie)
                End If
            Next

            Return oCookieContainerObject
        End Function

        Private Sub SaveCookies(sCookieString As String)
            ' Convert cookie string to global cookie collection object

            Dim sCookieStrings() As String = sCookieString.Trim.Replace("path=/,", "").Replace("path=/", "").Split(";".ToCharArray())

            oCookieCollection = New CookieCollection

            For Each sCookie As String In sCookieStrings
                If sCookie.Trim <> "" Then
                    Dim sName As String = sCookie.Trim().Split("=".ToCharArray())(0)
                    Dim sValue As String = sCookie.Trim().Split("=".ToCharArray())(1)

                    oCookieCollection.Add(New Cookie(sName, sValue))
                End If
            Next
        End Sub

        Private Sub SaveCookies(oCookieContainer As CookieContainer)
            ' Convert cookie container object to global cookie collection object

            oCookieCollection = New CookieCollection

            For Each oCookie As System.Net.Cookie In oCookieContainer.GetCookies(New Uri(sMainURL))
                oCookieCollection.Add(oCookie)
            Next
        End Sub

        Private Function UrlEncode(ByRef URLText As String) As String
            Dim AscCode As Integer
            Dim EncText As String = ""
            Dim bStr() As Byte = System.Text.Encoding.ASCII.GetBytes(URLText)

            Try
                For i As Long = 0 To UBound(bStr)
                    AscCode = bStr(i)

                    Select Case AscCode
                        Case 48 To 57, 65 To 90, 97 To 122, 46, 95
                            EncText = EncText & Chr(AscCode)

                        Case 32
                            EncText = EncText & "+"

                        Case Else
                            If AscCode < 16 Then
                                EncText = EncText & "%0" & Hex(AscCode)
                            Else
                                EncText = EncText & "%" & Hex(AscCode)
                            End If

                    End Select
                Next i

                Erase bStr
            Catch ex As WebException
                MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End Try

            Return EncText
        End Function
    End Class


回答2:

Private Function DownloadImage() As String
    Dim remoteImgPath As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080"
    Dim remoteImgPathUri As New Uri(remoteImgPath)
    Dim remoteImgPathWithoutQuery As String = remoteImgPathUri.GetLeftPart(UriPartial.Path)
    Dim fileName As String = Path.GetFileName(remoteImgPathWithoutQuery)
    Dim localPath As String = Convert.ToString(AppDomain.CurrentDomain.BaseDirectory + "LocalFolder\Images\Originals\") & fileName
    Dim webClient As New WebClient()
    webClient.DownloadFile(remoteImgPath, localPath)
    Return localPath
End Function

I threw this together I think its the right direction.

Try

        Dim theFile As String = "c:\wallpaper.jpg"

        Dim fileName As String

        fileName = Path.GetFileName(theFile)



        Dim ms = New MemoryStream(File.ReadAllBytes(theFile))



        Dim dataLengthToRead As Long = ms.Length
        Dim blockSize As Integer = If(dataLengthToRead >= 5000, 5000, CInt(dataLengthToRead))
        Dim buffer As Byte() = New Byte(dataLengthToRead - 1) {}


        Response.Clear()
        Response.ClearContent()
        Response.ClearHeaders()
        Response.BufferOutput = True


        Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName)
        Response.AddHeader("Content-Disposition", "inline; filename=" + fileName)

        Response.AddHeader("Content-Length", blockSize.ToString())
        Response.ContentType = "image/JPEG"



        While dataLengthToRead > 0 AndAlso Response.IsClientConnected
            Dim lengthRead As Int32 = ms.Read(buffer, 0, blockSize)
            Response.OutputStream.Write(buffer, 0, lengthRead)
            Response.Flush()
            dataLengthToRead = dataLengthToRead - lengthRead
        End While




        Response.Flush()
        Response.Close()


    Catch ex As Exception

    End Try