My first post on StackOverflow so please go easy on me :)
I'm an intern at a VR company and am trying to webscrape Spokeo.com for data. I have successfully extracted the correct data using XMLHTTP (as there is a large quantity of data being searched) but have come across an issue when not being able to read the information unless logged in.
E.G Searched term:7980 Sunset Blvd Result: Company ***** ***** *****, ***** *****.
If I manually log in and search for this location I can get all of the information censored by asterisks.
My Question is: How Can I log into Spokeo.com (similar script to google login forms) through MSXML2.XMLHTTP
I have already done a fair bit of research and come across this article, but I'm unable to translate from the login of his example to mine because I can't code for shit! haha
This is the working code for my web scraping:
Sub GetOwners()
Dim URL As String, lastRow As Long
Dim XMLHttp As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim var As String
Dim var1 As Object
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
URL = "https://www.spokeo.com/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHttp = CreateObject("MSXML2.serverXMLHTTP")
XMLHttp.Open "GET", URL, False
XMLHttp.setRequestHeader "Content-Type", "text/xml"
XMLHttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHttp.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHttp.responseText
Set objResultDiv = html.GetElementById("profile_selection")
Set var1 = html.GetElementById("property_owners")
Cells(i, 2).Activate
If html.GetElementById("property_owners") Is Nothing Then
Cells(i, 2).Value = "-"
Else
Cells(i, 2).Value = var1.innerText
End If
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
And this is my Frankensteinian attempt at taking Vitor Prado's code and applying for my application:
Sub testXMLHTTP()
Dim xml As Object
Dim html As HTMLDocument
Dim dados As Object
Dim text As Object
Dim html2 As HTMLDocument
Dim xml2 As Object
Set xml = CreateObject("Msxml2.ServerXMLHTTP.6.0")
Set html = CreateObject("htmlFile")
With xml
.Open "GET", "https://www.spokeo.com/login?", False
.send
End With
strCookie = xml.getResponseHeader("Set-Cookie")
html.body.innerHTML = xml.responseText
Set objvstate = html.GetElementById("__VIEWSTATE")
Set objvstategen = html.GetElementById("__VIEWSTATEGENERATOR")
Set objeventval = html.GetElementById("__EVENTVALIDATION")
vstate = objvstate.Value
vstategen = objvstategen.Value
eventval = objeventval.Value
'URL Encode ViewState
Dim ScriptEngine As ScriptControl
Set ScriptEngine = New ScriptControl
ScriptEngine.Language = "JScript"
ScriptEngine.AddCode "function encode(vstate) {return encodeURIComponent(vstate);}"
Dim encoded As String
encoded = ScriptEngine.Run("encode", vstate)
vstate = encoded
'URL Encode Event Validation
ScriptEngine.AddCode "function encode(eventval) {return encodeURIComponent(eventval);}"
encoded = ScriptEngine.Run("encode", eventval)
eventval = encoded
'URL Encode ViewState Generator
ScriptEngine.AddCode "function encode(vstategen) {return encodeURIComponent(vstategen);}"
encoded = ScriptEngine.Run("encode", vstategen)
vstategen = encoded
Postdata = "__EVENTTARGET=" & "&__EVENTARGUMENT=" & "&__VIEWSTATE=" & vstate & "&__VIEWSTATEGENERATOR=" & vstategen & "&__EVENTVALIDATION=" & eventval & "&ctl00$ddlTipoUsuario=#rdBtnNaoContribuinte" & "&ctl00$UserNameAcessivel=Digite+o+Usuário" & "&ctl00$PasswordAcessivel=x" & "&ctl00$ConteudoPagina$Login1$rblTipo=rdBtnNaoContribuinte" & "&ctl00$ConteudoPagina$Login1$UserName=MYUSERNAME" & "&ctl00$ConteudoPagina$Login1$Password=MYPASSWORD" & "&ctl00$ConteudoPagina$Login1$Login=Acessar" & "&ctl00$ConteudoPagina$Login1$txtCpfCnpj=Digite+o+Usuário"
Set xml2 = CreateObject("Msxml2.ServerXMLHTTP.6.0")
Set html2 = CreateObject("htmlFile")
With xml2
.Open "POST", "https://www.spokeo.com/login?", False
.setRequestHeader "Cookie", strCookie
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "Content-Lenght", Len(Postdata)
.send (Postdata)
End With
html2.body.innerHTML = xml2.responseText
Set objResult = html2.GetElementById("dadosDoUsuario")
GetElementById = objResult.innerText
MsgBox GetElementById
End Sub
There are a lot of references that I haven't changed purely because I don't know where to look.
I will still combine the two codes into one with the login being first once I know it will work individually.
Any help would be appreciated and I apologize in advance for my coding ignorance.
Cheers!