Amazon Sales Data (with Excel VBA)

2019-01-27 04:35发布

问题:

I'm trying to obtain the result number (in the HTML code) of each keyword I search by means of Excel VBA. Narrowing down the criteria by className, id, and data-asin, but that last one is proving to be tricky since VBA doesn't support that reference library yet.

What this code is structured to do:

  1. Go onto amazon.com and go to the search bar.

  2. The loop starts with column C. Pull a search term from the SearchTerm1 column and search.

  3. Once the results page is loaded, try and find the specified product by className, ID (both found in the HTML code), and ASIN number (this number is pulled from column B in order to match the data-asin value on the search results page). Without all 3 criteria, excel won't be able to find the product if it's listed on the first results page.

  4. If the product is found on the first page, grab its result position (eg. "result_0" for the first result and result_1 for the second) and puts it in column D (SRank1).
  5. If the product is not found, the loop keeps going anyways until the search term column is empty.
  6. If the product is not found on the first page, continue advancing pages and searching for the specified product in order to grab the "search rank" of the item.
  7. The same steps are repeated for columns SearchTerm2, 3, and 4.

The screenshot below is the code only pulling based on the className and ID criteria, and it pulls the last product result from the page, which is not what my goal for analysation of how products are doing.

The code included only pulls the product rank if the product is the first result on the search page, which means something is finally working, but is missing a step or two to grab all of the product positions from the page.

Any help or push in the right direction would be highly appreciated. I wish VBA were more versatile for these kinds of sales research things. It's done wonders so far, but I may be reaching its limit. Code is below.

Sub AmazonSearchRank()

    Dim MyHTML_Element As IHTMLElement
    Dim MyURL As String

    Dim AASearchRank As Workbook
    Dim AAws As Worksheet
    Dim InputSearchOrder As HTMLInputElement
    Dim elems As IHTMLElementCollection
    Dim TDelement As HTMLTableCell

    Dim InputSearchButton As HTMLInputButtonElement
    Dim IE As InternetExplorer
    Dim AASearchTerms As Workbook
    Dim SearchTermsSheet As Worksheet

    Dim x As Integer
    Dim i As Long

    MyURL = "https://www.amazon.com"
    Set IE = New InternetExplorer
    With IE
        .Silent = True
        .Navigate MyURL
        .Visible = True
        Do
            DoEvents
        Loop Until .ReadyState = READYSTATE_COMPLETE
    End With
    Set HTMLDoc = IE.Document

    Set AASearchRank = Application.Workbooks.Open("C:\Users\CompanyName\Desktop\Automation Anywhere\Sample_Items_For_SearchRank.xls")
    Set AAws = AASearchRank.Worksheets("Sheet1")

    Set InputSearchButton = HTMLDoc.getElementById("nav-search-submit-text")
    Set InputSearchOrder = HTMLDoc.getElementById("twotabsearchbox")
    If Not InputSearchOrder Is Nothing Then
        InputSearchButton.Click
        Do
            DoEvents
        Loop Until IE.ReadyState = READYSTATE_COMPLETE
    End If

    x = 2
    If AAws.Range("D" & x).Value = "" Then
        Do Until AAws.Range("B" & x) = ""
            Set InputSearchOrder = HTMLDoc.getElementById("twotabsearchtextbox")
            InputSearchOrder.Value = AAws.Range("C" & x)

            Set InputSearchButton = HTMLDoc.getElementsByClassName("nav-input")(0)
            InputSearchButton.Click
              Do
                DoEvents
            Loop Until IE.ReadyState = READYSTATE_COMPLETE
            Application.Wait (Now + TimeValue("0:00:05"))

            Set elems = HTMLDoc.getElementsByClassName("s-result-item celwidget")
            i = 2
            For Each TDelement In elems
                If TDelement.className = "s-result-item celwidget" And InStr(TDelement.ID, "result") InStr(TDelement.innerHTML, AAws.Range("B" & x).Value) Then
                    AAws.Range("D" & x).Value = TDelement.ID
                    i = i + 1
                End If
            Next
        x = x + 1
        Loop
    End If

End Sub

回答1:

Here is the example which downloads products from Amazon for each search query presented on the sheet Terms, and populates the sheet Products with ASINs and descriptions. It uses XHR, so IE isn't needed. The code is as follows:

Sub Test()
    lngRow = 1
    ' search each term
    For Each strTerm In Sheets("Terms").UsedRange
        lngPage = 1
        Do
            ' HTTP GET request of the search result page
            strUrl = "https://www.amazon.com/s/ref=nb_sb_noss_2?page=" & lngPage & "&keywords=" & EncodeUriComponent(strTerm)
            Set objXHR = CreateObject("MSXML2.XMLHttp")
            objXHR.Open "GET", strUrl, False
            objXHR.Send
            strResp = objXHR.ResponseText
            ' split response to array by items
            arrResp = Split(strResp, "<li id=""result_")
            ' process each item on the page
            For i = 1 To UBound(arrResp)
                strItem = arrResp(i)
                ' extract ASIN
                strTmp = Split(strItem, "data-asin=""")(1)
                strTmp = Split(strTmp, """")(0)
                Sheets("Products").Cells(lngRow, 1).NumberFormat = "@"
                Sheets("Products").Cells(lngRow, 1).Value = strTmp
                ' extract the product description
                strTmp = Split("<li id=""result_" & strItem, "</li>")(0) & "</li>"
                Sheets("Products").Cells(lngRow, 2).Value = GetInnerText(strTmp)
                ' show current item
                Sheets("Products").Cells(lngRow, 1).Select
                ' next row
                lngRow = lngRow + 1
            Next
            ' adjust sheet
            Sheets("Products").Columns.AutoFit
            Sheets("Products").Rows.AutoFit
            ' next page
            lngPage = lngPage + 1
        Loop Until UBound(arrResp) = 0 ' empty search result
    Next
End Sub

Function EncodeUriComponent(strText)
    Static objHtmlfile As Object
    If objHtmlfile Is Nothing Then
        Set objHtmlfile = CreateObject("htmlfile")
        objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
    End If
    EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function

Function GetInnerText(strHtmlContent)
    Dim objHtmlFile, objBody
    Set objHtmlFile = CreateObject("htmlfile")
    objHtmlFile.write strHtmlContent
    Set objBody = objHtmlFile.getElementsByTagName("body")(0)
    GetInnerText = Trim(objBody.innerText)
End Function

I placed on the Terms sheet:

Results on the Product sheet contain 571 items:

It's not a complete answer, but I hope it helps you.



回答2:

Through trial and error, I finally solved this bloody thing. I just had to take out part of the code which included the "And InStr(TDelement.ID, "result")" and then everything ran smooth as butter.