I'm trying to summarize a data.frame
which contains date (or time) information.
Let's suppose this one containing hospitalization records by patient:
df <- data.frame(c(1, 2, 1, 1, 2, 2),
c(as.Date("2013/10/15"), as.Date("2014/10/15"), as.Date("2015/7/16"), as.Date("2016/1/7"), as.Date("2015/12/20"), as.Date("2015/12/25")))
names(df) <- c("patient.id", "hospitalization.date")
df
looks like this:
> df
patient.id hospitalization.date
1 1 2013-10-15
2 2 2014-10-15
3 1 2015-07-16
4 1 2016-01-07
5 2 2015-12-20
6 2 2015-12-25
For each observation, I need to count the number of hospitalizations occuring in the 365 days before that hospitalization.
In my example it would be the new df$hospitalizations.last.year
column.
> df
patient.id hospitalization.date hospitalizations.last.year
1 1 2013-10-15 1
2 2 2014-10-15 1
3 1 2015-07-16 1
4 2 2015-12-20 1
5 2 2015-12-25 2
6 1 2016-01-07 2
7 2 2016-02-10 3
Note that the counter is including the number of previous records in the last 365 days, not only in the current year.
I'm trying to do that using dplyr
or data.table
because my dataset is huge and performance matters. ¿Is it possible?