Web Scraping using VBA from courier website

2020-01-19 00:46发布

I want to have a record of parcels having tracking numbers in excel column A and details of the available field on other columns so that whenever I press the button to run module it updates me about the parcel details fetching from the website. The website I am targeting is "http://trackandtrace.courierpost.co.nz/Search/". I have made the code to embed tracking number after that link and to fetch other fields but the code is not fetching any data it just opens up the link using internet explorer. The error i get is

enter image description here

Here is my code:

Sub Yellowcom()
    'Dim ieObj As InternetExplorer
    Dim htmlELe As IHTMLElement
    Dim HTML As HTMLDocument
    Dim i As Integer
    Dim x As Integer


    Dim URL As String
    Dim URLParameter As String
    Dim page As Long
    Dim links As Object
    Dim IE As Object


    i = 1

    Set IE = CreateObject("InternetExplorer.Application")
    'Set ieObj = New InternetExplorer
    IE.Visible = True
    URL = "http://trackandtrace.courierpost.co.nz/search/"
    'Application.Wait Now + TimeValue("00:00:05")
    x = 1

    For page = 2 To 10

        If page > 1 Then URLParameter = Sheet1.Cells(x, 1).Value

        IE.navigate URL & URLParameter

        ' Wait for the browser to load the page
        Do Until IE.readyState = 4

            DoEvents

        Loop

       Set HTML = IE.document
Set OrganicLinks = HTML.getElementsByClassName("search-results organic")
Set links = OrganicLinks.Item(0).getElementsByClassName("info")



    For Each htmlELe In links

        With ActiveSheet
            .Range("A" & i).Value = htmlELe.Children(0).textContent

            On Error Resume Next
.Range("B" & i).Value = htmlELe.getElementsByClassName("track-visit-website")(0).href
On Error GoTo 0


            On Error Resume Next
.Range("C" & i).Value = htmlELe.getElementsByClassName("info-section info-secondary")(0).href
On Error GoTo 0


            '.Range("B" & i).Value = htmlELe.getElementsByTagName("a")(0).href
            '.Range("C" & i).Value = htmlELe.Children(2).textContent
            .Range("D" & i).Value = htmlELe.Children(2).querySelector("a[href]")
             'links2 = htmlELe.getElementsByClassName("links")(1)
           ' .Range("D" & i).Value = links2.href


        End With
    i = i + 1

    x = x + 1
    Next htmlELe


    Next page

    IE.Quit
    Set IE = Nothing

    End Sub

标签: excel vba
2条回答
来,给爷笑一个
2楼-- · 2020-01-19 01:20

I highly recommend you use background objects to send info to websites, e.g. the following MSXML2 objects can be used to send GET and POST requests, in the following code I'm sending a request to your website with the search code (pulled from values in column A) and then putting your required delivery status and time xml in column B

Sub demoMsxml2()
  Dim mySearchCode As String
  Dim myConnection As Object
  Dim Status As String
  Dim i As Long
  Dim lastRow As Long
  lastRow = Sheet1.Cells(Sheet1.Rows.Count, 1).End(xlUp).Row
  For i = 1 To lastRow
    mySearchCode = Sheet1.Range("A" & i).Value2
    Set myConnection = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    Call myConnection.Open("GET", "http://trackandtrace.courierpost.co.nz/Search/" + mySearchCode)
    myConnection.send
    Sheet1.Range("B" & i).Value2 = ExtractString(Trim(Replace(myConnection.responseText, vbCrLf, "")), "<li class=""status""><span", "</li>")
  Next i
End Sub

Function ExtractString(parentString As String, beginsWith As String, endsWith As String) As String
  Dim a As Long: a = InStr(1, parentString, beginsWith)
  Dim b As Long: b = InStr(a, parentString, endsWith)
  If (a <> 0 And b <> 0) Then ExtractString = Trim(Mid(parentString, a, b - a)) Else ExtractString = ""
End Function

Instead of putting the text into column B you can just scrape your data from it. Using this method means you don't have to see anything on the screen, no creating internet explorer instances, no waiting for pages to load etc. it's all handled automatically.

查看更多
Lonely孤独者°
3楼-- · 2020-01-19 01:24

Please read all comments. Let us know where there are problems. Otherwise it will not work I think.

Option Explicit

Sub Yellowcom()
  Const basicURL As String = "http://trackandtrace.courierpost.co.nz/search/" 'Always the same is a constant

  Dim IE As Object
  Dim OrganicLinks As Object
  Dim htmlELe As Object
  Dim links As Object
  Dim URL As String
  Dim urlTrackingNumber As String 'It isn't a parameter of an url so I use this variable instead of 'URLParameter'. It's an addition
                                  '[A list of parameters beginns with a qustionmark (?) and all following with an ampersand (&)]
  Dim trackingNumber As Long      'There is no pagination. So use what it's about and not "page" as variable
  Dim currentRow As Integer       'Use sounding names for variables. You (and everybody else) can read your code better. WTF is x?
  Dim currentColumn As Integer    'Use sounding names for variables. You (and everybody else) can read your code better. WTF is i?
  Dim firstColumn As Integer      'A dynamic for the first used column for data

  'Initialize row and column to write first
  currentRow = 2              'First row where to write data
  firstColumn = 1             'First column where to write data
  currentColumn = firstColumn 'Current column at this point is first column

  'Loop over all tracking numbers from first row of tracking numbers till the last row in table
  For trackingNumber = currentRow To ActiveSheet.Cells(Rows.Count, 1).End(xlUp).row 'I hope ActiveSheet is correct? Same question as above (One sheet, two sheets?)
    urlTrackingNumber = Sheets(1).Cells(currentRow, currentColumn).Value            'Real Sheet1? Difference to ActiveSheet? Than 'currentColumn' will work here too
                                                                                    'I ask, because you use same variable for rows in both sheets in your code
                                                                                    '[Same question as above (One sheet, two sheets?)]
    URL = basicURL & urlTrackingNumber

    'Initialize Internet Explorer, set visibility,
    'call URL and wait until page is fully loaded
    Set IE = CreateObject("internetexplorer.application")
    IE.Visible = True
    IE.navigate URL
    Do Until IE.ReadyState = 4: DoEvents: Loop
    'Use the following line if needed (AJAX content)
    'Application.Wait Now + TimeValue("00:00:05")

    Set OrganicLinks = IE.document.getElementsByClassName("search-results organic") 'Really existing?

    'Check if ;-)
    If Not OrganicLinks Is Nothing Then
      Set links = OrganicLinks.Item(0).getElementsByClassName("info") 'I don't know what this should do, because I don't know anything about OrganicLinks

      'I can't say anything about your following code, because I'am in the html code nirvana at now
      'What I can tell you is 'On Error Resume Next' and 'On Error GoTo 0' is used wrong here. But if you want let's talk about that later
      For Each htmlELe In links
        'No solutions for the lines in this code block
        With ActiveSheet
          .Range("A" & currentColumn).Value = htmlELe.Children(0).textContent

          On Error Resume Next
          .Range("B" & currentColumn).Value = htmlELe.getElementsByClassName("track-visit-website")(0).href
          On Error GoTo 0

          On Error Resume Next
          .Range("C" & currentColumn).Value = htmlELe.getElementsByClassName("info-section info-secondary")(0).href
          On Error GoTo 0

          '.Range("B" & i).Value = htmlELe.getElementsByTagName("a")(0).href
          '.Range("C" & i).Value = htmlELe.Children(2).textContent
          .Range("D" & currentColumn).Value = htmlELe.Children(2).querySelector("a[href]")
          'links2 = htmlELe.getElementsByClassName("links")(1)
          ' .Range("D" & i).Value = links2.href
        End With
        currentRow = currentRow + 1
        currentColumn = currentColumn + 1
      Next htmlELe
    Else
      'No organic links
      MsgBox "No html element with css class 'search-results organic'" 'This is for developers only
    End If

    'The IE is a little diva. So we will left her to use her (most hope) from (nearly) the same situation than before
    'Clean up
    IE.Quit
    Set IE = Nothing
    Set htmlELe = Nothing
    Set OrganicLinks = Nothing
    Set links = Nothing
  Next trackingNumber
End Sub
查看更多
登录 后发表回答