I have to deal with JSON documents that contain nested documents and at some level have an array which in turn contains individual documents that conceptionally would map back to a "data frame row" when reading/parsing the JSON in R.
How can I ensure that all data frames
are casted into tibbles
when
retrieving data from the database?
Desired result for example data below
Desired result
query_res$levelOne <- query_res$levelOne %>% tibble::as_tibble()
query_res$levelOne$levelTwo <- query_res$levelOne$levelTwo %>%
tibble::as_tibble()
query_res$levelOne$levelTwo$levelThree <- query_res$levelOne$levelTwo$levelThree %>%
purrr::map(tibble::as_tibble)
query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
If I try to do that via dplyr::mutate()
or purrr::map*_df()
, I get the Error: Column
is of unsupported class data.frame
error.
Related post
Recursively ensuring tibbles instead of data frames when parsing/manipulating nested JSON
Example
JSON data to put into file dump.json
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 1,"z": true},{"x": "B","y": 2,"z": false}]}},"schema": "0.0.1"}
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 10,"z": false},{"x": "B","y": 20,"z": true}]}},"schema": "0.0.1"}
Importing JSON into MongoDB
con <- mongolite::mongo(
db = "stackoverflow",
collection = "nested_json"
)
con$import(file("dump.json"))
This is what you should see within MongoDB
Query via $find()
query_res <- con$find() %>%
tibble::as_tibble()
query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
Query via $iterate()
it <- con$iterate()
iter_res <- list()
while(!is.null(x <- it$one())) {
# Ensure array columns stay individual list columns when casting to tibble:
# (As opposed to multiple array items being turned into one tibble row)
p <- function(x) {
is.list(x) &&
is.null(names(x))
}
f <- function(x) {
list(x %>% unlist())
}
x <- x %>% purrr::map_if(p, f)
# Necessary to get the `simplifyVector = TRUE` effect:
iter_res_current <- x %>%
jsonlite:::simplify() %>%
tibble::as_tibble()
# Combine with previous iteration results:
iter_res <- c(iter_res, list(iter_res_current))
}
iter_res_df <- iter_res %>%
dplyr::bind_rows()
iter_res_df %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:List of 2
# ..$ :List of 1
# .. ..$ levelThree:'data.frame': 2 obs. of 3 variables:
# .. .. ..$ x: chr "A" "B"
# .. .. ..$ y: int 1 2
# .. .. ..$ z: logi TRUE FALSE
# ..$ :List of 1
# .. ..$ levelThree:'data.frame': 2 obs. of 3 variables:
# .. .. ..$ x: chr "A" "B"
# .. .. ..$ y: int 10 20
# .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"