VBA - Number of Google News Search Results

2019-07-28 00:45发布

I have a cell that contains something I would like searched in google news. I want the code to return the number of results for that search. Currently I have this code which I found elsewhere on the site and does not use google news but even then I sometimes get a

runtime error -2147024891 (80070005)

after 70 or so searched and I can't run again.

Sub HawkishSearch()

Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date

lastRow = Range("B" & Rows.Count).End(xlUp).Row

Dim cookie As String
Dim result_cookie As String

start_time = Time
Debug.Print "start_time:" & start_time

For i = 2 To lastRow

    url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send

    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText

If html.getElementById("resultStats") Is Nothing Then
    str_text = "0 Results"
Else
    str_text = html.getElementById("resultStats").innerText
End If
    Cells(i, 3) = str_text
    DoEvents
Next

end_time = Time
Debug.Print "end_time:" & end_time

Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

1条回答
Animai°情兽
2楼-- · 2019-07-28 01:04

Best option (IMO) is to use the Google News API and register for an API key. You can then use a queryString including your search term and parse the JSON response to get the result count. I do that below and also populate a collection with the article titles and links. I use a JSON parser called JSONConverter.bas which you download and add to your project. You can then go to VBE > Tools > References > add a reference to Microsoft Scripting Runtime.


Sample JSON response from API:

enter image description here

The {} denotes a dictionary which you access by key, the [] denotes a collection which you access by index or by For Each loop over.

I use the key totalResults to retrieve the total results count from the initial dictionary returned by the API.

I then loop the collection of dictionaries (articles) and pull the story titles and URLs.

You can then inspect the results in the locals window or print out

Sample of results in locals window:

enter image description here


Option Explicit

Public Sub GetStories()
    Dim articles As Collection, article As Object
    Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)
    Set finalResults = New Collection
    searchTerm = "Obama"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        Set json = JsonConverter.ParseJson(.responseText)
    End With

    Debug.Print "total results = " & json("totalResults")

    Set articles = json("articles")
    For Each article In articles
       arr(0) = article("title")
       arr(1) = article("url")
       finalResults.Add arr
    Next

    Stop '<== Delete me later

End Sub

Loop:

If deploying in a loop you can use a class clsHTTP to hold the XMLHTTP object. This is more efficient than creating and destroying. I supply this class with a method GetString to retrieve the JSON response from the API, and a GetInfo method to parse the JSON and retrieve the results count and the API results URLs and Titles.

Example of results structure in locals window:

enter image description here

Class clsHTTP:

Option Explicit   
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        GetString = .responseText
    End With
End Function

Public Function GetInfo(ByVal json As Object) As Variant
    Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object

    finalResults(0) = json("totalResults")
    Set articles = json("articles")

    ReDim results(1 To articles.Count, 1 To 2)

    For Each article In articles
        counter = counter + 1
        results(counter, 1) = article("title")
        results(counter, 2) = article("url")
    Next

    finalResults(1) = results
    GetInfo = finalResults
End Function

Standard module:

Option Explicit

Public Sub GetStories()
    Dim http As clsHTTP, json As Object
    Dim finalResults(), searchTerms(), searchTerm As Long, url As String
    Set http = New clsHTTP

    With ThisWorkbook.Worksheets("Sheet1")
        searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
    End With

    ReDim finalResults(1 To UBound(searchTerms))

    For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)

        url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"

        Set json = JsonConverter.ParseJson(http.GetString(url))

        finalResults(searchTerm) = http.GetInfo(json)

        Set json = Nothing

    Next

    Stop '<==Delete me later
End Sub

'

Otherwise:

I would use the following where I grab story links by their class name. I get the count and write the links to a collection

Option Explicit

Public Sub GetStories()
    Dim sResponse As String, html As HTMLDocument, articles As Collection
    Const BASE_URL As String = "https://news.google.com/"
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument: Set articles = New Collection
    Dim numberOfStories As Long, nodeList As Object, i As Long
    With html
        .body.innerHTML = sResponse
        Set nodeList = .querySelectorAll(".VDXfz")
        numberOfStories = nodeList.Length
        Debug.Print "number of stories = " & numberOfStories
        For i = 0 To nodeList.Length - 1
            articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
        Next
    End With
    Debug.Print articles.Count
End Sub

Standard Google search:

The following works an example standard google search but you will not always get the same HTML structure depending on your search term. You will need to provide some failing cases to help me determine if there is a consistent selector method that can be applied.

Option Explicit
Public Sub GetResultsCount()
    Dim sResponse As String, html As HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.google.com/search?q=mitsubishi", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument
    With html
        .body.innerHTML = sResponse
        Debug.Print .querySelector("#resultStats").innerText
    End With

End Sub
查看更多
登录 后发表回答