I have date ranges that are grouped by two variables (id
and type
) that are currently stored in a data frame called data
. My goal is to expand the date range such that I have a row for each day within the range of dates, which includes the same id
and type
.
Here is a snippet to reproduce an example of the data frame:
data <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), type = c("a",
"a", "b", "c", "b", "a", "c", "d", "e", "f"), from = structure(c(1235199600,
1235545200, 1235545200, 1235631600, 1235631600, 1242712800, 1242712800,
1243058400, 1243058400, 1243231200), class = c("POSIXct", "POSIXt"
), tzone = ""), to = structure(c(1235372400, 1235545200, 1235631600,
1235890800, 1236236400, 1242712800, 1243058400, 1243231200, 1243144800,
1243576800), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("id",
"type", "from", "to"), row.names = c(700L, 753L, 2941L, 2178L,
2959L, 679L, 2185L, 12L, 802L, 1796L), class = "data.frame")
This is a visual representation of the data set:
id type from to
1 a 2009-02-21 2009-02-23
1 a 2009-02-25 2009-02-25
1 b 2009-02-25 2009-02-26
1 c 2009-02-25 2009-03-01
1 b 2009-05-26 2009-03-05
2 a 2009-05-26 2009-05-19
2 c 2009-05-19 2009-05-23
2 d 2009-05-19 2009-05-25
2 e 2009-05-23 2009-05-24
2 f 2009-05-25 2009-05-29
Here is a visual representation of the intended result:
id type date
1 a 2009-02-21
1 a 2009-02-22
1 a 2009-02-23
1 b 2009-02-25
1 b 2009-02-26
1 c 2009-02-26
1 c 2009-02-27
1 c 2009-02-28
1 c 2009-03-01
...
2 f 2009-05-25
2 f 2009-05-26
2 f 2009-05-27
2 f 2009-05-28
2 f 2009-05-29
I've found several similar posts (link and link) that were helpful in giving me a starting point. I've attempted to use a plyr solution:
data2 <- adply(data, 1, summarise, date = seq(data$from, data$to))[c('id', 'type')]
However, this results in the error:
Error: 'from' must be of length 1
I have also attempted to use a data.table solution:
data[, list(date = seq(from, to)), by = c('id', 'type')]
However, this gives me a different error:
Error in `[.data.frame`(data, , list(date = seq(from, to)), by = c("id", :
unused argument (by = c("id", "type"))
Any thoughts on how to go about resolving these errors (or using a different approach) would be greatly appreciated.
1) by Here is a three line answer using
by
from the base of R. First we convert the dates to"Date"
class givingdata2
. Then we applyf
which does the real work over each row and finally werbind
the resulting rows together:2) data.table Using the same
data2
with data.table we do it like this:2a) data.table or alternately this where
dt
is from (2) andf
is from (1):3) dplyr with dplyr it gives a warning but otherwise works where
data2
andf
are from (1):UPDATES Some improvements.
Here is one way to perform such a transformation using base functions
And the first chunck of the output is