Web Scraping - VBA Search Parameters Not Working P

2019-07-09 05:20发布

I am working on a web scraping project which would scrape ticketing information off a travel website.

I am currently encountering an issue where the search parameters defined in my VBA code and later input into the website to be executed is not working. The code that has been written is provided below. To provide some background, I am reading in the to/from destinations from my Excel workbook (e.g. Beijing(北京)) and defining the travel date in the same format (MM-DD-YYYY) as the website would expect it to be input. However, when running, the site does not seem to recognize the parameters and directs me to a page saying "site under maintenance". The odd thing is, when I manually input the parameters, the site recognizes it and provides ticketing information.

Am I perhaps missing something? Do I have to update other values outside of "DepartureCity", "ArrivalCity", and "DepartDate"?

I also noticed that when I loop through multiple cities, the site searches for the same parameters as previously defined (i.e. if searching Shanghai -> Beijing, it yields Tianjin -> Beijing which I had previously searched for). Is there a way to auto remove the search history/cache via VBA?

' save from and to destinations under a defined string
sFrom = Range("C3").Value
sTo = Range("C4").Value

' "i" to track the # of days out as defined by the user
For i = 0 To cntDays
    dtRange = Date + i

    ' establish date to pull train ticketing information on
    If Len(Day(dtRange)) = 1 Then
        sDay = "0" & Day(dtRange)
    Else:
        sDay = Day(dtRange)
    End If

    If Len(Month(dtRange)) = 1 Then
        sMonth = "0" & Month(dtRange)
    Else:
        sMonth = Month(dtRange)
    End If

    sDate = sMonth & "-" & sDay & "-" & Year(dtRange)

    ' instantiate the oIE object
    Set oIE = CreateObject("InternetExplorer.Application")

    ' open Ctrip travel portal
    sURL = "http://english.ctrip.com/trains/#ctm_ref=nb_tn_top"
    With oIE
        .navigate sURL
        .Visible = True

        Do Until (.readyState = 4 And Not .Busy)
           DoEvents
        Loop

        ' search for particular entry
        .document.getElementsByName("DepartureCity")(0).Value = sFrom
        .document.getElementsByName("ArrivalCity")(0).Value = sTo
        .document.getElementsByName("DepartDate")(0).Value = sDate

        MsgBox sFrom
        MsgBox sTo
        MsgBox sDate

        Set ElementCol = .document.getElementsByTagName("button")
            For Each btnInput In ElementCol
                If btnInput.innerText = "Search" Then
                    btnInput.Click
                    Exit For
                End If
            Next btnInput

        ' ensure page has been fully loaded
        Do Until (.readyState = 4 And Not .Busy)
           DoEvents
        Loop

1条回答
仙女界的扛把子
2楼-- · 2019-07-09 05:56

Looking at this a little closer, the site uses a GET request to perform the search.
As such, there is no need to load the page, populate the fields, and click the button.
You can set the values in the URL directly and bypass the initial page.

For instance, to search for trains going from Shanghai to Beijing on 12-9-2015, load the following URL...

http://english.ctrip.com/trains/List/Index?DepartureCity=shanghai%28%E4%B8%8A%E6%B5%B7%29&ArrivalCity=beijing%28%E5%8C%97%E4%BA%AC%29&DepartDate=12-9-2015&DepartureStation=%E4%B8%8A%E6%B5%B7&ArrivalStation=%E5%8C%97%E4%BA%AC

When broken down looks like this...

http://english.ctrip.com/trains/List/Index?
DepartureCity=shanghai%28%E4%B8%8A%E6%B5%B7%29
ArrivalCity=beijing%28%E5%8C%97%E4%BA%AC%29
DepartDate=12-9-2015
DepartureStation=%E4%B8%8A%E6%B5%B7
ArrivalStation=%E5%8C%97%E4%BA%AC

From my own testing, I've determined that each of the above fields are required or you get the "maintenance" screen...

Which means you need to know the station codes as well.

In addition you must supply the special characters in the names...

shanghai%28%E4%B8%8A%E6%B5%B7%29

查看更多
登录 后发表回答