I have the following list of list. It contains two variables: pair and genes. The contain of pair
is always vector with two strings. And the variable genes
is a vector which can contain more than 1 values.
lol <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"), .Names = c("pair",
"genes")), structure(list(pair = c("BoneMarrow", "Umbilical"),
genes = "GNB2L1"), .Names = c("pair", "genes")), structure(list(
pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair",
"genes")))
lol
#> [[1]]
#> [[1]]$pair
#> [1] "BoneMarrow" "Pulmonary"
#>
#> [[1]]$genes
#> [1] "PRR11"
#>
#>
#> [[2]]
#> [[2]]$pair
#> [1] "BoneMarrow" "Umbilical"
#>
#> [[2]]$genes
#> [1] "GNB2L1"
#>
#>
#> [[3]]
#> [[3]]$pair
#> [1] "Pulmonary" "Umbilical"
#>
#> [[3]]$genes
#> [1] "ATP1B1"
How can I convert it into this dataframe:
pair1 pair2 genes_vec
BoneMarrow Pulmonary PRR11
BoneMarrow Umbilical GNB2L1
Pulmonary Umbilical ATP1B1
Note that the genes
variable is a vector not single string.
My best try is this which doesn't give what I want:
> do.call(rbind, lapply(lol, data.frame, stringsAsFactors=FALSE))
pair genes
1 BoneMarrow PRR11
2 Pulmonary PRR11
3 BoneMarrow GNB2L1
4 Umbilical GNB2L1
5 Pulmonary ATP1B1
6 Umbilical ATP1B1
Update:
With new example to show vector content of genes
lol2 <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = c("GNB2L1",
"PRR11")), .Names = c("pair", "genes")), structure(list(pair = c("BoneMarrow",
"Umbilical"), genes = "GNB2L1"), .Names = c("pair", "genes")),
structure(list(pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair",
"genes")))
lol2
#> [[1]]
#> [[1]]$pair
#> [1] "BoneMarrow" "Pulmonary"
#>
#> [[1]]$genes
#> [1] "GNB2L1" "PRR11"
#>
#>
#> [[2]]
#> [[2]]$pair
#> [1] "BoneMarrow" "Umbilical"
#>
#> [[2]]$genes
#> [1] "GNB2L1"
#>
#>
#> [[3]]
#> [[3]]$pair
#> [1] "Pulmonary" "Umbilical"
#>
#> [[3]]$genes
#> [1] "ATP1B1"
The expected output is:
pair1 pair2 genes_vec
BoneMarrow Pulmonary PRR11,GNB2L1
BoneMarrow Umbilical GNB2L1
Pulmonary Umbilical ATP1B1
Maybe like this:
Using
tidyverse
, you could usepurrr
to help youwith the second example, just replace
map_chr(lol, "genes")
withmap(lol2, "genes")
as you want to keep a nested dataframe with a list column.And a more generic approach would be to work with nested tibbles and unnest them as needed
For
lol2
nothing changes unless the listlol2
instead oflol1
You can then unnest what the column you want
This should work:
The same way you treat the genes as a vector is the same way you can the same way you can treat the pairs as a vector.. instead of pair 1 and 2 you just use both of them.