I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame.
I got some pointers from an earlier question which was trying to do something similar but more complex.
Here's an example of what I am starting with (this is grossly simplified for illustration):
listOfDataFrames <- vector(mode = "list", length = 100)
for (i in 1:100) {
listOfDataFrames[[i]] <- data.frame(a=sample(letters, 500, rep=T),
b=rnorm(500), c=rnorm(500))
}
I am currently using this:
df <- do.call("rbind", listOfDataFrames)
Code:
Session:
UPDATE: Rerun 31-Jan-2018. Ran on the same computer. New versions of packages. Added seed for seed lovers.
Use bind_rows() from the dplyr package:
There is also
bind_rows(x, ...)
indplyr
.One other option is to use a plyr function:
This is a little slower than the original:
My guess is that using
do.call("rbind", ...)
is going to be the fastest approach that you will find unless you can do something like (a) use a matrices instead of a data.frames and (b) preallocate the final matrix and assign to it rather than growing it.Edit 1:
Based on Hadley's comment, here's the latest version of
rbind.fill
from CRAN:This is easier than rbind, and marginally faster (these timings hold up over multiple runs). And as far as I understand it, the version of
plyr
on github is even faster than this.The only thing that the solutions with
data.table
are missing is the identifier column to know from which dataframe in the list the data is coming from.Something like this:
The
idcol
parameter adds a column (.id
) identifying the origin of the dataframe contained in the list. The result would look to something like this:For the purpose of completeness, I thought the answers to this question required an update. "My guess is that using
do.call("rbind", ...)
is going to be the fastest approach that you will find..." It was probably true for May 2010 and some time after, but in about Sep 2011 a new functionrbindlist
was introduced in thedata.table
package version 1.8.2, with a remark that "This does the same asdo.call("rbind",l)
, but much faster". How much faster?