Ensure that data frames become tibbles when readin

2020-05-08 00:38发布

问题:

I have to deal with JSON documents that contain nested documents and at some level have an array which in turn contains individual documents that conceptionally would map back to a "data frame row" when reading/parsing the JSON in R.

How can I ensure that all data frames are casted into tibbles when retrieving data from the database?

Desired result for example data below

Desired result

query_res$levelOne <- query_res$levelOne %>% tibble::as_tibble()
query_res$levelOne$levelTwo <- query_res$levelOne$levelTwo %>% 
  tibble::as_tibble()
query_res$levelOne$levelTwo$levelThree <- query_res$levelOne$levelTwo$levelThree %>% 
  purrr::map(tibble::as_tibble)

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

If I try to do that via dplyr::mutate() or purrr::map*_df(), I get the Error: Columnis of unsupported class data.frame error.

Related post

Recursively ensuring tibbles instead of data frames when parsing/manipulating nested JSON


Example

JSON data to put into file dump.json

{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 1,"z": true},{"x": "B","y": 2,"z": false}]}},"schema": "0.0.1"}
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 10,"z": false},{"x": "B","y": 20,"z": true}]}},"schema": "0.0.1"}

Importing JSON into MongoDB

con <- mongolite::mongo(
  db = "stackoverflow",
  collection = "nested_json"
)

con$import(file("dump.json"))

This is what you should see within MongoDB

Query via $find()

query_res <- con$find() %>%
  tibble::as_tibble()

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

Query via $iterate()

it <- con$iterate()

iter_res <- list()
while(!is.null(x <- it$one())) {
  # Ensure array columns stay individual list columns when casting to tibble:
  # (As opposed to multiple array items being turned into one tibble row)
  p <- function(x) {
    is.list(x) &&
      is.null(names(x))
  }
  f <- function(x) {
    list(x %>% unlist())
  }
  x <- x %>% purrr::map_if(p, f)

  # Necessary to get the `simplifyVector = TRUE` effect:
  iter_res_current <- x %>%
    jsonlite:::simplify() %>%
    tibble::as_tibble()

  # Combine with previous iteration results:
  iter_res <- c(iter_res, list(iter_res_current))
}
iter_res_df <- iter_res %>%
  dplyr::bind_rows()

iter_res_df %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:List of 2
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  1 2
#   .. .. ..$ z: logi  TRUE FALSE
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  10 20
#   .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"