Scraping the source code using VBA-Macros

2019-07-14 05:44发布

问题:

I need to crawl the price values from the price comparison website (product link: https://www.toppreise.ch/prod_488002.html). I am not able to scrape. see the highlighted price in the image that I want to capture:

Please help me how to crawl this page.

PS: toppreise.ch will not be accessible in many countries so use VPN

I am using the below code:

Private Sub SiteInfo_Click()
Dim strhtml
On Error Resume Next
ThisWorkbook.Sheets("Data Mining").Activate
Sheets("Data Mining").Range("B1").Select
Set xmlHttp = Nothing
Set xmlHttp = CreateObject("MSXML2.ServerXMLHTTP.6.0")

    StrUrl = ""
    StrUrl = Sheets("Data Mining").Range("B1").Value
    xmlHttp.Open "GET", StrUrl, False
    xmlHttp.Send
    strhtml =xmlHttp.responseText
    END Sub

When I run above code I am only getting below response text . It doesn't gives the whole page. (You can check the source code by using the product link or view here https://www.dropbox.com/s/ah80jt7a25xcicp/source%20code.txt?dl=0 )

<html><head>
        <script type="text/javascript" src="//en.toppreise.ch/js/tpjs.js"></script>
        <script type="text/javascript" src="//en.toppreise.ch/js/afxp.js"></script>
        <script type="text/javascript" src="//en.toppreise.ch/js/jquery.min.js"></script>
        <script type="text/javascript" src="//en.toppreise.ch/js/jquery-ui-autocomplete.min.js"></script>
    </head><body>...   

回答1:

This code works, Thanks SIM

Sub Get_Price()
Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
Dim post As HTMLDivElement

With HTTP
.Open "GET", "https://www.toppreise.ch/index.php?a=488002", False
.send
HTML.body.innerHTML = .responseText
End With

For Each post In HTML.getElementsByClassName("altLinesOdd")
With post.getElementsByTagName("a")
     If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
End With
With post.getElementsByClassName("spaceVert nobreak")
     If .Length Then Cells(R, 2) = .Item(0).innerText
End With
Next post
End Sub