I'm trying to read information from the Zillow API and am running into some data structure issues in R. My outputs are supposed to be xml and appear to be, but aren't behaving like xml.
Specifically, the object that GetSearchResults() returns to me is in a format similar to XML, but not quite right to read in R's XML reading functions.
Can you tell me how I should approach this?
#set directory
setwd('[YOUR DIRECTORY]')
# setup libraries
library(dplyr)
library(XML)
library(ZillowR)
library(RCurl)
# setup api key
set_zillow_web_service_id('[YOUR API KEY]')
xml = GetSearchResults(address = '120 East 7th Street', citystatezip = '10009')
data = xmlParse(xml)
This throws the following error:
Error: XML content does not seem to be XML
The Zillow API documentation clearly states that the output should be XML, and it certainly looks like it. I'd like to be able to easily access various components of the API output for larger-scale data manipulation / aggregation. Let me know if you have any ideas.
This was a fun opportunity for me to get acquainted with the Zillow API. My approach, following How to parse XML to R data frame, was to convert the response to a list, for ease of inspection. The onerous bit was figuring out the structure of the data through inspecting the list, particularly because each property might have some missing data. This was why I wrote the getValRange
function to deal with parsing the Zestimate data.
results <- xmlToList(xml$response[["results"]])
getValRange <- function(x, hilo) {
ifelse(hilo %in% unlist(dimnames(x)), x["text",hilo][[1]], NA)
}
out <- apply(results, MAR=2, function(property) {
zpid <- property$zpid
links <- unlist(property$links)
address <- unlist(property$address)
z <- property$zestimate
zestdf <- list(
amount=ifelse("text" %in% names(z$amount), z$amount$text, NA),
lastupdated=z$"last-updated",
valueChange=ifelse(length(z$valueChange)==0, NA, z$valueChange),
valueLow=getValRange(z$valuationRange, "low"),
valueHigh=getValRange(z$valuationRange, "high"),
percentile=z$percentile)
list(id=zpid, links, address, zestdf)
})
data <- as.data.frame(do.call(rbind, lapply(out, unlist)),
row.names=seq_len(length(out)))
Sample output:
> data[,c("id", "street", "zipcode", "amount")]
id street zipcode amount
1 2098001736 120 E 7th St APT 5A 10009 2321224
2 2101731413 120 E 7th St APT 1B 10009 2548390
3 2131798322 120 E 7th St APT 5B 10009 2408860
4 2126480070 120 E 7th St APT 1A 10009 2643454
5 2125360245 120 E 7th St APT 2A 10009 1257602
6 2118428451 120 E 7th St APT 4A 10009 <NA>
7 2125491284 120 E 7th St FRNT 1 10009 <NA>
8 2126626856 120 E 7th St APT 2B 10009 2520587
9 2131542942 120 E 7th St APT 4B 10009 1257676
# setup libraries
pacman::p_load(dplyr,XML,ZillowR,RCurl) # I use pacman, you don't have to
# setup api key
set_zillow_web_service_id('X1-mykey_31kck')
xml <- GetSearchResults(address = '120 East 7th Street', citystatezip = '10009')
dat <- unlist(xml)
str(dat)
Named chr [1:653] "120 East 7th Street" "10009" "Request successfully
processed" "0" "response" "results" "result" "zpid" "text"
"2131798322" "links" ...
- attr(*, "names")= chr [1:653] "request.address" "request.citystatezip" "message.text" "message.code" ...
dat <- as.data.frame(dat)
dat <- gsub("text","", dat$dat)
I'm not exactly sure what you wanted to do with these results but they're there and they look fine:
head(dat, 20)
[1] "120 East 7th Street"
[2] "10009"
[3] "Request successfully processed"
[4] "0"
[5] "response"
[6] "results"
[7] "result"
[8] "zpid"
[9] ""
[10] "2131798322"
[11] "links"
[12] "homedetails"
[13] ""
[14] "http://www.zillow.com/homedetails/120-E-7th-St-APT-5B-New-York-NY-10009/2131798322_zpid/"
[15] "mapthishome"
[16] ""
[17] "http://www.zillow.com/homes/2131798322_zpid/"
[18] "comparables"
[19] ""
[20] "http://www.zillow.com/homes/comps/2131798322_zpid/"
As stated previously, the trick is to get the API into a list (as opposed to XML). Then it becomes quite simple to pull out whatever data you are interested in.
I wrote an R package that simplifies this. Take a look on github - https://github.com/billzichos/homer. It comes with a vignette.
Assuming the Zillow ID of the property you were interested in was 36086728, the code would look like.
home_estimate("36086728")