I am working on a web scraping project which would scrape ticketing information off a travel website.
I am currently encountering an issue where the search parameters defined in my VBA code and later input into the website to be executed is not working. The code that has been written is provided below. To provide some background, I am reading in the to/from destinations from my Excel workbook (e.g. Beijing(北京)) and defining the travel date in the same format (MM-DD-YYYY) as the website would expect it to be input. However, when running, the site does not seem to recognize the parameters and directs me to a page saying "site under maintenance". The odd thing is, when I manually input the parameters, the site recognizes it and provides ticketing information.
Am I perhaps missing something? Do I have to update other values outside of "DepartureCity", "ArrivalCity", and "DepartDate"?
I also noticed that when I loop through multiple cities, the site searches for the same parameters as previously defined (i.e. if searching Shanghai -> Beijing, it yields Tianjin -> Beijing which I had previously searched for). Is there a way to auto remove the search history/cache via VBA?
' save from and to destinations under a defined string
sFrom = Range("C3").Value
sTo = Range("C4").Value
' "i" to track the # of days out as defined by the user
For i = 0 To cntDays
dtRange = Date + i
' establish date to pull train ticketing information on
If Len(Day(dtRange)) = 1 Then
sDay = "0" & Day(dtRange)
Else:
sDay = Day(dtRange)
End If
If Len(Month(dtRange)) = 1 Then
sMonth = "0" & Month(dtRange)
Else:
sMonth = Month(dtRange)
End If
sDate = sMonth & "-" & sDay & "-" & Year(dtRange)
' instantiate the oIE object
Set oIE = CreateObject("InternetExplorer.Application")
' open Ctrip travel portal
sURL = "http://english.ctrip.com/trains/#ctm_ref=nb_tn_top"
With oIE
.navigate sURL
.Visible = True
Do Until (.readyState = 4 And Not .Busy)
DoEvents
Loop
' search for particular entry
.document.getElementsByName("DepartureCity")(0).Value = sFrom
.document.getElementsByName("ArrivalCity")(0).Value = sTo
.document.getElementsByName("DepartDate")(0).Value = sDate
MsgBox sFrom
MsgBox sTo
MsgBox sDate
Set ElementCol = .document.getElementsByTagName("button")
For Each btnInput In ElementCol
If btnInput.innerText = "Search" Then
btnInput.Click
Exit For
End If
Next btnInput
' ensure page has been fully loaded
Do Until (.readyState = 4 And Not .Busy)
DoEvents
Loop
Looking at this a little closer, the site uses a GET request to perform the search.
As such, there is no need to load the page, populate the fields, and click the button.
You can set the values in the URL directly and bypass the initial page.
For instance, to search for trains going from Shanghai to Beijing on 12-9-2015, load the following URL...
http://english.ctrip.com/trains/List/Index?DepartureCity=shanghai%28%E4%B8%8A%E6%B5%B7%29&ArrivalCity=beijing%28%E5%8C%97%E4%BA%AC%29&DepartDate=12-9-2015&DepartureStation=%E4%B8%8A%E6%B5%B7&ArrivalStation=%E5%8C%97%E4%BA%AC
When broken down looks like this...
From my own testing, I've determined that each of the above fields are required or you get the "maintenance" screen...
Which means you need to know the station codes as well.
In addition you must supply the special characters in the names...
shanghai%28%E4%B8%8A%E6%B5%B7%29