I am hoping someone can help....
I have 8000 values in a excel spreadsheet that I need to search in a website and then record a specific line of data from the website to be inputted back into the excel spreadsheet.
I have found a previous post which searches for the data I am looking at excel macro to search a website and extract results
with the code being;
Sub URL_Get_ABN_Query()
strSearch = Range("a1")
With ActiveSheet.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Range("a5"))
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
`enter code here`
End Sub
However, when I run then Macro in Excel it collects all the data from the website like this.
I only want the 'entity type' data line to be inputted. I have searched everywhere and can not seem to find how to extend the code to only grab this line of information and input to the corresponding cell (i.e. ABN(b2)search, find input 'entity type' and paste into Company Type(c2).
Alternatively, I have also tried to find how to get the macro to fill the information vertically instead of horizontally as then I could delete the columns that are not needed, I thought this may be a simpler way to run this macro but alas again I could not find help to do it. I also tried to record the macro with developer but that did not work either.
I also need to loop it to run the next ABN and populate the corresponding field and so on (B3>C3, B4>C4, etc etc).
I would love some help figuring, I am a beginner in VBA and think what I want to do is beyond my skill level at this time. I am trying to understand through tutorials, google searches and help pages but can not seem to find how or if this can be done.
My alternative is to do this manually for each of the 8000 data points, copying each abn, searching in the website and then copying the entity type and pasting into excel, I did try this first but after a while started searching for a better way. Can you help????
This is absolutely possible. You've got what I often find the hardest part, sourcing the information from another platform. To make this work I would separate it out a little bit and for simplicity use 2 sheets (Sheet1 with your known data and Sheet2 for the web data).
Loop through your table of ~8000 businesses. We can identify this from the UsedRange number of Rows. We know that the ABN is in column 2 (also known as B) so we copy that into the variable to pass to the function. The function will return the "Entity type:" to column 3 (C) of the same row.
Sub LoopThroughBusinesses()
Dim i As Integer
Dim ABN As String
For i = 2 To Sheet1.UsedRange.Rows.Count
ABN = Sheet1.Cells(i, 2)
Sheet1.Cells(i, 3) = URL_Get_ABN_Query(ABN)
Next i
End Sub
Change the subroutine you created to a Function so it returns the entity type you are after. The function will save the data into Sheet2 and then return just the Entity data that we are after.
Function URL_Get_ABN_Query(strSearch As String) As String ' Change it from a Sub to a Function that returns the desired string
' strSearch = Range("a1") ' This is now passed as a parameter into the Function
Dim entityRange As Range
With Sheet2.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Sheet2.Range("A1")) ' Change this destination to Sheet2
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Entity type:")
' Then return the value of the cell to its' right
URL_Get_ABN_Query = entityRange.Offset(0, 1).Value2
' Clear Sheet2 for the next run
Sheet2.UsedRange.Delete
End Function
You do not want a load of connections (queryTables) set up in this way. It will be so slow if even possible. At 8000 requests, provided xmlhttp is not blocked or throttled, the below method will be significantly faster. If there does appear to be slowing/blocking then add in a small wait every x requests.
If possible use xmlhttp to gather data. Use css selectors to specifically target the entity type. Store values in an array and write out with loop at end. Use a class to hold the xmlhttp object for greater efficiency. Provide your class with methods including how to handle not found (example given). Add some further optimizations e.g. given is switching off screen-updating. This assumes your search numbers are in column B from B2. The code below also does some basic checks that there is something present in column B and handles the case of there being 1 or more numbers.
Good code is modular and you want a function to return something and a sub to perform actions. A single sub/function shouldn't complete lots of tasks. You want to easily debug with code that follows the principle of single responsibility (or close to it).
class clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
GetHTML = StrConv(.responseBody, vbUnicode)
End With
End Function
Public Function GetEntityType(ByVal html As HTMLDocument) As String
On Error GoTo errhand:
GetEntityType = html.querySelector("a[href*='EntityTypeDescription']").innerText
Exit Function
errhand:
GetEntityType = "Not Found"
End Function
Standard module:
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults(), i As Long, html As HTMLDocument
Set html = New HTMLDocument
Set http = New clsHTTP
Const BASE_URL As String = "http://www.abr.business.gov.au/ABN/View/"
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.rows.Count, "B").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("B2").Value
Case Else
arr = .Range("B2:B" & lastRow).Value
End Select
ReDim groupResults(1 To lastRow - 1)
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetHTML(BASE_URL & arr(i, 1))
html.body.innerHTML = sResponse
groupResults(i) = .GetEntityType(html)
sResponse = vbNullString: html.body.innerHTML = vbNullString
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(i + 1, "C") = groupResults(i)
Next
End With
Application.ScreenUpdating = True
End Sub
References (VBE> Tools > References):
- Microsoft HTML Object Library
CSS selectors:
I use the fact the entity description is a hyperlink (a
tag) and that its value contains the string EntityTypeDescription
to use a css attribute = value with contains (*) operator to target.