flatten a data frame

2019-03-25 02:29发布

问题:

I have this nested data frame

test <- structure(list(id = c(13, 27), seq = structure(list(
`1` = c("1997", "1997", "1997", "2007"),
`2` = c("2007", "2007", "2007", "2007", "2007", "2007", "2007")), 
.Names = c("1", "2"))), .Names = c("penr", 
"seq"), row.names = c("1", "2"), class = "data.frame")

I want a list of all values in the second column, namely

result <- c("1997", "1997", "1997", "2007", "2007", "2007", "2007", "2007", "2007", "2007", "2007")

Is there an easy way to achieve this?

回答1:

This line does the trick:

do.call("c", test[["seq"]])

or equivalent:

c(test[["seq"]], recursive = TRUE)

or even:

unlist(test[["seq"]])

The output of these functions is:

    11     12     13     14     21     22     23     24     25     26     27 
"1997" "1997" "1997" "2007" "2007" "2007" "2007" "2007" "2007" "2007" "2007" 

To get rid of the names above the character vector, call as.character on the resulting object:

> as.character((unlist(test[["seq"]])))
 [1] "1997" "1997" "1997" "2007" "2007" "2007" "2007" "2007" "2007" "2007"
[11] "2007"


回答2:

This is not an answer but a follow up/supplement to Paul's answer:

Consistently on any number of iterations the c method performs the best. However as I increased the number of iterations to 100000 unlist went from the poorest to very close to the c method.

1000 iterations

     test replications elapsed relative user.self sys.self user.child sys.child
2       c         1000    0.04 1.333333      0.03        0         NA        NA
1 do.call         1000    0.03 1.000000      0.03        0         NA        NA
3  unlist         1000    0.23 7.666667      0.04        0         NA        NA

100,000 iterations

     test replications elapsed relative user.self sys.self user.child sys.child
2       c       100000    8.39 1.000000      3.62        0         NA        NA
1 do.call       100000   10.47 1.247914      4.04        0         NA        NA
3  unlist       100000    9.97 1.188319      3.81        0         NA        NA

Again thanks for sharing Paul!

Benchmarking performed using rbenchmark on a win 7 machine running R 2.14.1