I want to play with data that is now saved in JSON format. But I am very new to R and have little clue of how to play with data. You can see below what I managed to achieve. But first, my code:
library(rjson)
json_file <- "C:\\Users\\Saonkfas\\Desktop\\WOWPAPI\\wowpfinaljson.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
I was able to the data:
for (x in json_data){print (x)}
Although output looks pretty raw:
[[1]]
[[1]]$wins
[1] "118"
[[1]]$losses
[1] "40"
# And so on
Note that the JSON is somewhat nested. I could create tables with Python, but R seems much more complicated.
Edit:
My JSON:
{
"play1": [
{
"wins": "118",
"losses": "40",
"max_killed": "7",
"battles": "158",
"plane_id": "4401",
"max_ground_object_destroyed": "3"
},
{
"wins": "100",
"losses": "58",
"max_killed": "7",
"battles": "158",
"plane_id": "2401",
"max_ground_object_destroyed": "3"
},
{
"wins": "120",
"losses": "38",
"max_killed": "7",
"battles": "158",
"plane_id": "2403",
"max_ground_object_destroyed": "3"
}
],
"play2": [
{
"wins": "12",
"losses": "450",
"max_killed": "7",
"battles": "158",
"plane_id": "4401",
"max_ground_object_destroyed": "3"
},
{
"wins": "150",
"losses": "8",
"max_killed": "7",
"battles": "158",
"plane_id": "2401",
"max_ground_object_destroyed": "3"
},
{
"wins": "120",
"losses": "328",
"max_killed": "7",
"battles": "158",
"plane_id": "2403",
"max_ground_object_destroyed": "3"
}
],
fromJSON
returns a list, you can use the *apply
functions to go through each element.
It's fairly straightforward (once you know what to do!) to convert it to a "table" (data frame is the correct R terminology).
library(rjson)
# You can pass directly the filename
my.JSON <- fromJSON(file="test.json")
df <- lapply(my.JSON, function(play) # Loop through each "play"
{
# Convert each group to a data frame.
# This assumes you have 6 elements each time
data.frame(matrix(unlist(play), ncol=6, byrow=T))
})
# Now you have a list of data frames, connect them together in
# one single dataframe
df <- do.call(rbind, df)
# Make column names nicer, remove row names
colnames(df) <- names(my.JSON[[1]][[1]])
rownames(df) <- NULL
df
wins losses max_killed battles plane_id max_ground_object_destroyed
1 118 40 7 158 4401 3
2 100 58 7 158 2401 3
3 120 38 7 158 2403 3
4 12 450 7 158 4401 3
5 150 8 7 158 2401 3
6 120 328 7 158 2403 3
I find jsonlite
to be a little more user friendly for this task. Here is a comparison of three JSON parsing packages (biased in favor of jsonlite
)
library(jsonlite)
data <- fromJSON('path/to/file.json')
data
#> $play1
# wins losses max_killed battles plane_id max_ground_object_destroyed
# 1 118 40 7 158 4401 3
# 2 100 58 7 158 2401 3
# 3 120 38 7 158 2403 3
#
# $play2
# wins losses max_killed battles plane_id max_ground_object_destroyed
# 1 12 450 7 158 4401 3
# 2 150 8 7 158 2401 3
# 3 120 328 7 158 2403 3
If you want to collapse those list names into a new column, I recommend dplyr::bind_rows
rather than do.call(rbind, data)
library(dplyr)
data <- bind_rows(data, .id = 'play')
# Source: local data frame [6 x 7]
# play wins losses max_killed battles plane_id max_ground_object_destroyed
# (chr) (chr) (chr) (chr) (chr) (chr) (chr)
# 1 play1 118 40 7 158 4401 3
# 2 play1 100 58 7 158 2401 3
# 3 play1 120 38 7 158 2403 3
# 4 play2 12 450 7 158 4401 3
# 5 play2 150 8 7 158 2401 3
# 6 play2 120 328 7 158 2403 3
Beware that the columns may not have the type you expect (notice the columns are all characters since all of the numbers were quoted in the provided JSON data)!
Edit Nov. 2017: One approach to type conversion would be to use mutate_if
to guess the intended type of character columns.
data <- mutate_if(data, is.character, type.convert, as.is = TRUE)
I prefer tidyjson over rjson and jsonlite as it has a easy workflow for converting multilevel nested json objects to 2 dimensional tables. Your problem can be easily solved using this package from github.
devtools::install_github("sailthru/tidyjson")
library(tidyjson)
library(dplyr)
> json %>% as.tbl_json %>% gather_keys %>% gather_array %>%
+ spread_values(
+ wins = jstring("wins"),
+ losses = jstring("losses"),
+ max_killed = jstring("max_killed"),
+ battles = jstring("battles"),
+ plane_id = jstring("plane_id"),
+ max_ground_object_destroyed = jstring("max_ground_object_destroyed")
+ )
Output
document.id key array.index wins losses max_killed battles plane_id max_ground_object_destroyed
1 1 play1 1 118 40 7 158 4401 3
2 1 play1 2 100 58 7 158 2401 3
3 1 play1 3 120 38 7 158 2403 3
4 1 play2 1 12 450 7 158 4401 3
5 1 play2 2 150 8 7 158 2401 3
6 1 play2 3 120 328 7 158 2403 3