Excel from web scraping

2020-02-16 05:12发布

I want to pull all 6 of these tables from this website into my workbook. (vs All,vs PG,vs SG,vs SF,vs PF,vs C) When I try using the from web option in excel and selecting the table it just pulls in the headers. Why is that? Currently I have a paste now button and I goto the website, copy it and click the "Paste" button I created with a macro to clear current info and paste new values. I would like to eliminate me having to manually goto the website and copy the table. Is there another way to do it besides "From Web"

1条回答
对你真心纯属浪费
2楼-- · 2020-02-16 05:22

Make sure you are selecting the right table. There are two table elements. The first is just headers. The second is headers + info. I am not sure you can use this method to get all the tabs though as the URL doesn't change and the content is javascript updated. You can see whether the API has anything to offer though it is gate-kept by staff who want to speak to you before issuing an API key.

Any easy way is to go VBE > Tools > References > Add a reference to Microsoft Internet Controls then use Internet Explorer to navigate to the page.

You can use a CSS selector to target the table by its id and another CSS selector class selector to target all the tab links so as to click them to update the table for each tab.

Option Explicit

Public Sub GetInfo()
    Dim IE As New InternetExplorer, hTable As HTMLTable
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    Dim clipboard As Object, tabs As Object, iTab As Long
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")

    With IE
        .Visible = True
        .navigate "https://swishanalytics.com/optimus/nba/daily-fantasy-team-defensive-ranks-position"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set tabs = .document.querySelectorAll(".position.fastClick")

        For iTab = 0 To tabs.Length - 1

            If iTab > 0 Then
                tabs.item(iTab).Click
                While .Busy Or .readyState < 4: DoEvents: Wend

            End If

            clipboard.SetText .document.querySelector("#stat-table").outerHTML
            clipboard.PutInClipboard

            With ws
                .Cells(GetLastRow(ws, 1) + 2, 1).PasteSpecial
            End With
        Next
        .Quit
    End With

End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

If you don't want to use the clipboard to copy paste the table you can loop its rows and table cells within rows.

Option Explicit

Public Sub GetInfo()
    Dim IE As New InternetExplorer, ws As Worksheet, tabs As Object, iTab As Long
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With IE
        .Visible = True
        .navigate "https://swishanalytics.com/optimus/nba/daily-fantasy-team-defensive-ranks-position"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set tabs = .document.querySelectorAll(".position.fastClick")

        For iTab = 0 To tabs.Length - 1
            If iTab > 0 Then
                tabs.item(iTab).Click
                While .Busy Or .readyState < 4: DoEvents: Wend
            End If

            WriteTable .document.querySelector("#stat-table"), GetLastRow(ws, 1) + 2, ws
        Next
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)

    If ws Is Nothing Then Set ws = ActiveSheet

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
    r = startRow
    With ws
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            .Cells(startRow, columnCounter) = header.innerText
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody
            Set tRow = tSection.getElementsByTagName("tr")
            For Each tr In tRow
                r = r + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell
                    .Cells(r, C).Value = td.innerText
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub
查看更多
登录 后发表回答