Successfully coercing paginated JSON object to R d

2019-06-19 11:11发布

问题:

I am trying to convert JSON pulled from an API into a data frame in R, so that I can use and analyze the data.

#Install needed packages
require(RJSONIO)
require(httr)

#request a list of companies currently fundraising using httr
r <- GET("https://api.angel.co/1/startups?filter=raising")
#convert to text object using httr
raise <- content(r, as="text")
#convert to list using RJSONIO
fromJSON(raise) -> new

Once I get this object, new, I am having a really difficult time parsing the list into a dataframe. The json has this structure:

{
  "startups": [
 {
  "id": 6702,
  "name": "AngelList",
  "quality": 10,
  "...": "...",
  "fundraising": {
    "round_opened_at": "2013-07-30",
    "raising_amount": 1000000,
    "pre_money_valuation": 2000000,
    "discount": null,
    "equity_basis": "equity",
    "updated_at": "2013-07-30T08:14:40Z",
    "raised_amount": 0.0
      }
    }
  ],
  "total": 4268 ,
  "per_page": 50,
  "page": 1,
  "last_page": 86
}

I've tried looking at individual elements within new using code like:

 new$startups[[1]]$fundraising$raised_amount

To pull the raised_amount for the first element listed. However, I don't know how to apply this to the whole list of 4268 startups. In particular, I can't figure out how to deal with the pagination. I only ever seem to get one page of startups (i.e. 50 of them) max.

I tried using a for loop to get the list of startups and just put each value into a row of a dataframe one by one. The example below shows this for just one column, but of course I could do it for all of them just by expanding the for loop. However, I can't get any content on any of the other pages.

df1 <- as.data.frame(1:length(new$startups))
df1$raiseamnt <- 0

for (i in 1:length(new$startups)) {
  df1$raiseamnt[i] <- new$startups[[i]]$fundraising$raised_amount
}

e: Thank you for the mention of pagination. I will look through the documents more carefully and see if I can figure out how to correctly structure the API calls to get different pages. I will update this answer if/when I figure that out!

回答1:

You may find the jsonlite package useful. Below is a quick example.

library(jsonlite)
library(httr)
#request a list of companies currently fundraising using httr
r <- GET("https://api.angel.co/1/startups?filter=raising")
#convert to text object using httr
raise <- content(r, as="text")
#parse JSON
new <- fromJSON(raise)

head(new$startups$id)
[1] 229734 296470 237516 305916 184460 147385

Note, however, this package or the one in the question can be of help to parse JSON string, individual structure should created appropriately so that each element of the string can be added without a problem and it is up to the developer.

For pagnation, the API seems to be a REST API so that filtering condition is normally added in the URL (eg https://api.angel.co/1/startups?filter=raising&variable=value). I guess it would be found somewhere in the API doc.



回答2:

httr library already imports jsonlite (httr documentation). The more elegant way with better formatted output is:

library(httr)    
resp <- httr::GET("https://api.angel.co/1/startups?filter=raising", accept_json())
cont <- content(resp, as = "parsed", type = "application/json")
#explicit convertion to data frame
dataFrame <- data.frame(cont)


标签: r httr jsonlite