I am working with an unbalanced, irregularly spaced cross-sectional time series. My goal is to obtain a lagged moving average vector for the "Quantity" vector, segmented by "Subject".
In other words, say the the the following Quanatities have been observed for Subject_1: [1,2,3,4,5]. I first need to lag it by 1, yielding [NA,1,2,3,4].
Then I need to take a moving average of order 3, yielding [NA,NA,NA,(3+2+1)/3,(4+3+2)/3]
The above needs to be done for all Subjects.
# Construct example balanced panel DF
panel <- data.frame(
as.factor(sort(rep(1:6,5))),
rep(1:5,6),
rnorm(30)
)
colnames(panel)<- c("Subject","Day","Quantity")
#Make panel DF unbalanced
panelUNB <- subset(panel,as.numeric(Subject)!= Day)
panelUNB <- panelUNB[-c(15,16),]
If the panel were balanced, i would first lag the "Quantity" variable using package plm
and functionlag
.
Then I would take the moving average of the lagged "Quanatity" like so, using function rollmean
from package zoo
:
panel$QuantityMA <- ave(panel$Quantity, panel$Subject, FUN = function(x) rollmean(
x,3,align="right",fill=NA,na.rm=TRUE))
This will yield the proper result when applied to the balanced 'panel' DF.
The problem is that plm
and lag
rely on the series being evenly spaced to generate an index variable, while rollapply demands that the number of observations (windowsize) is equal for all subjects.
There is solution on StackExchange with data.table that hints at a solution to my problem: Producing a rolling average of an unbalanced panel data set
Perhaps this solution can be modified to produce a fixed-length moving average instead of a "rolling cumulative average."
Does this give you the desired results?
So, to answer my own question, one way to do it is through split-lapply(rollingaverage)-unlist:
The "QuantityMA" vector would then have to be added back to the main "panelUNB" frame. Seems to be working. Lagging can be accomplished on an unbalanced panel with ddply.
If anyone has another, perhaps more elegant, solution, you're welcome to share it.