Sum over rows (rollapply) with time decay

2020-05-08 08:15发布

问题:

This is a follow on question to a question I posted earlier (see Sum over rows with multiple changing conditions R data.table for more details). I want to calculate how many times the 3 subjects have experienced an event in the last 5 years. So have been summing over a rolling window using rollapply from the zoo package. This assumes that the experience 5 years ago is as important as the experience 1 year ago (same weighting), so now I want to include a time decay for the experience that enters the sum. This basically means that the experience 5 years ago does not enter into the sum with the same weighting as the experience 1 year ago.

I my case I want to include an age dependent decay (even though for other applications faster or slower decays such as square root or squares could be possible).

For example lets assume I have the following data (I build on the previous data for clarity):

mydf <- data.frame (Year = c(2000, 2001, 2002, 2004, 2005,
                         2007, 2000, 2001, 2002, 2003,
                         2003, 2004, 2005, 2006, 2006, 2007),
                Name = c("Tom", "Tom", "Tom", "Fred", "Gill",
                         "Fred", "Gill", "Gill", "Tom", "Tom",
                         "Fred", "Fred", "Gill", "Fred", "Gill", "Gill"))

# Create an indicator for the experience 
mydf$Ind <- 1

# Load require packages
library(data.table)
library(zoo)

# Set data.table
setDT(mydf)
setkey(mydf, Name,Year)

# Perform cartesian join to calculate experience. I2 is the new experience indicator 
m <- mydf[CJ(unique(Name),seq(min(Year)-5, max(Year))),allow.cartesian=TRUE][,
        list(Ind = unique(Ind), I2 = sum(Ind,na.rm=TRUE)),
        keyby=list(Name,Year)]

# This is the approach I have been taking so far. Note that is a simple rolling sum of I2
m[,Exp := rollapply(I2, 5, function(x) sum(head(x,-1)), 
                align = 'right', fill=0),by=Name]

So question now is, how can I include a age dependent decay into this calculation. To model this I need to divide the experience by the age of the experience before it enters the sum.

I have been trying to get it to work using something along these lines:

 m[,Exp_age := rollapply(I2, 5, function(x) sum(head(x,-1)/(tail((Year))-head(Year,-1))), 
                     align = 'right', fill=0),by=Name]

But it does not work. I think my main problem is that I cannot get the age of the experience right so I can divide by the age in the sum. The result should look like the Exp_age column in the myres data.frame below

myres <- data.frame(Name = c("Fred", "Fred", "Fred", "Fred", "Fred", 
                         "Gill", "Gill", "Gill", "Gill", "Gill", "Gill", 
                         "Tom", "Tom", "Tom", "Tom", "Tom"), 
                Year = c(2003, 2004, 2004, 2006, 2007, 2000, 2001, 2005,
                         2005, 2006, 2007, 2000, 2001, 2002, 2002, 2003), 
                Ind = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
                Exp = c(0, 1, 1, 3, 4, 0, 1, 1, 1, 2, 3, 0, 1, 2, 2, 4), 
                Exp_age = c(0, 1, 1, 1.333333333, 1.916666667, 0, 1, 0.45, 
                            0.45, 2.2, 2, 0, 1, 1.5, 1.5, 2.833333333))

Any pointers would be greatly appreciated!

回答1:

If I understand you correctly, you are trying to do a rollapply with width=5 and rather than do a simple sum, you want to do a weighted sum. The weights are the age of the experience relative to the 5 year window. I would do this: first set the key in your data.table so that it has proper increasing order by Name, then you know that the last item in your x variable is the youngest and the first item is the oldest (you do this in your code already). I can't quite tell which way you want the weights to go (youngest to have greatest weight or oldest) but you get the point:

setkey(m, Name, Year)
my_fun = function(x) { w = 1:length(x); sum(x*w)}
m[,Exp_age:=rollapply(I2, width=5, by=1, fill=NA, FUN=my_fun, by.column=FALSE, align="right") ,by=Name]