web scraping using excel and VBA

2019-09-20 05:46发布

问题:

i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on

Sub extractTablesData()
    Dim IE As Object, obj As Object
    Dim myState As String
    Dim r As Integer, c As Integer, t As Integer
    Dim elemCollection As Object

    Set IE = CreateObject("InternetExplorer.Application")

    myState = InputBox("Enter the city where you wish to work")

    With IE

        .Visible = True
        .navigate ("http://www.funeralhomes.com/go/listing/Search?  name=&city=&state=&country=USA&zip=&radius=")

        While IE.readyState <> 4
            DoEvents
        Wend

        For Each obj In IE.document.all.item("state").Options
            If obj.innerText = myState Then
                obj.Selected = True
            End If
        Next obj

        IE.document.getElementsByValue("Search").item.Click

        Do While IE.Busy: DoEvents: Loop

        ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents

        Set elemCollection = IE.document.getElementsByTagName("TABLE")

        For t = 0 To (elemCollection.Length - 1)

            For r = 0 To (elemCollection(t).Rows.Length - 1)
                For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                    ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
                Next c
            Next r
        Next t

    End With
    Set IE = Nothing
End Sub

回答1:

Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.


Business name:

You can get the name with the following selector (using paid listing example):

div.paid-listing .listing-title

This selects (sample view)

Try


Address info:

The associated descriptive information can be retrieved with the selector:

div.paid-listing .address-summary

And then using split we can parse this into just the address information.


Code:

Option Explicit
Public Sub GetTitleAndAddress()
    Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
    Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
    Set oHtml = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", URL, False
        .send
        oHtml.body.innerHTML = .responseText
    End With

    Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
    Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")

    With Worksheets("Sheet3")
        .UsedRange.ClearContents
        For i = 0 To nodeList1.Length - 1
            .Range("A" & i + 1) = nodeList1.Item(i).innerText
            .Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
        Next i
    End With
End Sub

Example output:



回答2:

Yeah, without an API, this can be very tricky at best, and very inconsistent at worst. For now, you can try the script below.

Sub DumpData()

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

URL = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"

'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
   DoEvents
Loop

RowCount = 1

With Sheets("Sheet1")
   .Cells.ClearContents
   RowCount = 1
   For Each itm In IE.document.all
      If itm.classname Like "*free-listing*" Or itm.classname Like "*paid-listing*" Then
        .Range("A" & RowCount) = itm.classname
        .Range("B" & RowCount) = Left(itm.innertext, 1024)
            RowCount = RowCount + 1
      End If

   Next itm
End With
End Sub

You probably want some kind of input box to capture the city and state and radius from the user, or capture those variable in cells in your worksheet.

Notice, the '%20' is a space character.

I got this idea from a friend of mine, Joel, a long time ago. That guy is great!