r: create data frame with all possible options and

2019-04-13 16:20发布

问题:

This question might be obvious or asked already, but I can't find a solution:

I want to create a data frame with all possible combinations (and number of variables) such that it looks like the following example:

dataframe <- data.frame(variable =   1:4, 
                        a = c("gender", NA, NA, NA),
                        b = c("age", NA, NA, NA),
                        c = c("city", NA, NA, NA),
                        d = c("education", NA, NA, NA),
                        e = c("gender", "age", NA, NA),
                        f = c("gender", "city", NA, NA), 
                        g = c("gender", "education", NA, NA), 
                        h = c("age", "city", NA, NA), 
                        i = c("age", "education", NA, NA), 
                        j = c("city", "education", NA, NA), 
                        k = c("gender", "age", "city", NA), 
                        l = c("gender", "age", "education", NA), 
                        m = c("gender", "city", "education", NA),
                        n = c("gender", "age", "city", "education"))

I have too many variables, so it's not worth writing it out, and I want to avoid errors. Thank you for helping!

回答1:

Here is an option with combn. Get the vector of variable names, loop through the sequence of the vector, apply the combn on the vector with m specified as the sequence from the loop, convert to data.frame and cbind all the list elements together. The cbind.fill from rowr is suitable to fill with NA for list elements that have less number of rows than the maximum row data.frame

library(rowr)
res <- do.call(cbind.fill, c(fill = NA, lapply(seq_along(v1), function(i) {
       m1 <- combn(v1, i)
       if(is.vector(m1)) as.data.frame.list(m1)  else as.data.frame(m1)})))
colnames(res) <- letters[seq_along(res)]

Or as @Moody_Mudskipper suggested,

res1 <- do.call(cbind.fill, c(fill = NA, lapply(seq_along(v1), function(i) combn(v1, i))))
colnames(res1) <- letters[seq_len(ncol(res1))]

data

v1 <- c('gender', 'age', 'city', 'education')