My data frame contains the output of a survey with a select multiple question type. Some cells have multiple values.
df <- data.frame(a=1:3,b=I(list(1,1:2,1:3)))
df
a b
1 1 1
2 2 1, 2
3 3 1, 2, 3
I would like to flatten out the list to obtain the following output:
df
a b
1 1 1
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
should be easy but somehow I can't find the search terms. thanks.
You can just use unnest
from "tidyr":
library(tidyr)
unnest(df, b)
# a b
# 1 1 1
# 2 2 1
# 3 2 2
# 4 3 1
# 5 3 2
# 6 3 3
Using base R
, one option is stack
after naming the list
elements of 'b' column with that of the elements of 'a'. We can use setNames
to change the names.
stack(setNames(df$b, df$a))
Or another option would be to use unstack
to automatically name the list element of 'b' with 'a' elements and then do the stack
to get a data.frame
output.
stack(unstack(df, b~a))
Or we can use a convenient function listCol_l
from splitstackshape
to convert the list
to data.frame
.
library(splitstackshape)
listCol_l(df, 'b')
Here's one way, with data.table
:
require(data.table)
data.table(df)[,as.integer(unlist(b)),by=a]
If b
is stored consistently, as.integer
can be skipped. You can check with
unique(sapply(df$b,class))
# [1] "numeric" "integer"
Here's another base solution, far less elegant than any other solution posted thus far. Posting for the sake of completeness, though personally I would recommend akrun's base solution.
with(df, cbind(a = rep(a, sapply(b, length)), b = do.call(c, b)))
This constructs the first column as the elements of a
, where each is repeated to match the length of the corresponding list item from b
. The second column is b
"flattened" using do.call()
with c()
.
As Ananda Mahto pointed out in a comment, sapply(b, length)
can be replaced with lengths(b)
in the most recent version of R (3.2, if I'm not mistaken).