calculate rolling sum based on row index in R

2019-03-31 09:18发布

问题:

I am trying to calculate a grouped rolling sum based on a window size k but, in the event that the within group row index (n) is less than k, I want to calculate the rolling sum using the condition k=min(n,k).

My issue is similar to this question R dplyr rolling sum but I am looking for a solution that provides a non-NA value for each row.

I can get part of the way there using dplyr and rollsum:

library(zoo)
library(dplyr)
df <- data.frame(Date=rep(seq(as.Date("2000-01-01"),
            as.Date("2000-12-01"),by="month"),2),
            ID=c(rep(1,12),rep(2,12)),value=1)
df <- tbl_df(df)
df <- df %>% 
        group_by(ID) %>%
        mutate(total3mo=rollsum(x=value,k=3,align="right",fill="NA"))

df
Source: local data frame [24 x 4]
Groups: ID [2]

     Date    ID value tota3mo
   (date) (dbl) (dbl)   (dbl)
1  2000-01-01     1     1      NA
2  2000-02-01     1     1      NA
3  2000-03-01     1     1       3
4  2000-04-01     1     1       3
5  2000-05-01     1     1       3
6  2000-06-01     1     1       3
7  2000-07-01     1     1       3
8  2000-08-01     1     1       3
9  2000-09-01     1     1       3
10 2000-10-01     1     1       3
..        ...   ...   ...     ...

In this case, what I would like is to return the value 1 for observations on 2000-01-01 and the value 2 for observations on 2000-02-01. More generally, I would like the rolling sum to be calculated over the largest window possible but no larger than k.

In this particular case it's not too difficult to change some NA values by hand. However, ultimately I would like to add several more columns to my data frame that will be rolling sums calculated over various windows. In this more general case it will get quite tedious to go back change many NA values by hand.

回答1:

Using the partial=TRUE argument of rollapplyr :

df %>%
   group_by(ID) %>%
   mutate(roll = rollapplyr(value, 3, sum, partial = TRUE)) %>%
   ungroup()

or without dplyr (still need zoo):

roll <- function(x) rollapplyr(x, 3, sum, partial = TRUE)
transform(df, roll = ave(value, ID, FUN = roll))