I have a dataframe
df <- data.frame(
id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4),
date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01",
"1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19",
"1990-06-19", "2000-05-12"),
spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"),
y = rpois(10, 5))
id date spp y
1 1 1985-06-19 a 6
2 1 1985-06-19 b 3
3 1 1985-06-19 c 7
4 2 1985-08-01 c 7
5 2 1985-08-01 d 6
6 3 1990-06-19 b 5
7 3 1990-06-19 c 4
8 3 1990-06-19 d 4
9 3 1990-06-19 a 6
10 4 2000-05-12 b 6
I want to expand it so that there is every combination of id and spp and have y = 0
for every combination that is not currently in the dataframe. The dataframe is currently about 100,000 rows and 15 columns. When expanded it would be about 300,000 columns (there are 17 unique values of spp
in my actual dataset).
For every value of id
the date
is the same (e.g. when id = 2, date always = 1985-08-01). In my real dataset all the columns except spp
and y
can be specified by the id
.
I want to end up with something like:
id date spp y
1 1985-06-19 a 6
1 1985-06-19 b 3
1 1985-06-19 c 7
1 1985-06-19 d 0*
2 1985-08-01 a 0*
2 1985-08-01 b 0*
2 1985-08-01 c 7
2 1985-08-01 d 6
3 1990-06-19 b 5
3 1990-06-19 c 4
3 1990-06-19 d 4
3 1990-06-19 a 6
4 2000-05-12 a 0*
4 2000-05-12 b 6
4 2000-05-12 c 0*
4 2000-05-12 d 0*
- Indicate added rows
I will likely have to do this in the future with potentially much larger data frames, so a quick, efficient (time and memory) way to do this would be appreciated but any solution would satisfy me. I figure there should be ways to use the dplyr
, data.table
, or reshape
packages but I'm not very familiar with any of them. I'm not sure if it would be easiest to expand just rows id, spp, and y, then do a left_join()
or merge()
to recombine date (and all the other variables in my real dataframe) based on id
?
expand.grid
is a useful function here,Since you're not actually doing anything to subsets of the data I don't think
plyr
will help, maybe more efficient ways withdata.table
.There is a new function
complete
in the development version oftidyr
that does this. Of coursecomplete
usesexpand.grid
internally.I would go the second way, hope this helps