I am working with an unbalanced, irregularly spaced cross-sectional time series. My goal is to obtain a lagged moving average vector for the "Quantity" vector, segmented by "Subject".
In other words, say the the the following Quanatities have been observed for Subject_1:
[1,2,3,4,5]. I first need to lag it by 1, yielding [NA,1,2,3,4].
Then I need to take a moving average of order 3, yielding [NA,NA,NA,(3+2+1)/3,(4+3+2)/3]
The above needs to be done for all Subjects.
# Construct example balanced panel DF
panel <- data.frame(
as.factor(sort(rep(1:6,5))),
rep(1:5,6),
rnorm(30)
)
colnames(panel)<- c("Subject","Day","Quantity")
#Make panel DF unbalanced
panelUNB <- subset(panel,as.numeric(Subject)!= Day)
panelUNB <- panelUNB[-c(15,16),]
If the panel were balanced, i would first lag the "Quantity" variable using package plm
and functionlag
.
Then I would take the moving average of the lagged "Quanatity" like so, using function rollmean
from package zoo
:
panel$QuantityMA <- ave(panel$Quantity, panel$Subject, FUN = function(x) rollmean(
x,3,align="right",fill=NA,na.rm=TRUE))
This will yield the proper result when applied to the balanced 'panel' DF.
The problem is that plm
and lag
rely on the series being evenly spaced to generate an index variable, while rollapply demands that the number of observations (windowsize) is equal for all subjects.
There is solution on StackExchange with data.table that hints at a solution to my problem: Producing a rolling average of an unbalanced panel data set
Perhaps this solution can be modified to produce a fixed-length moving average instead of a "rolling cumulative average."
Does this give you the desired results?
library(reshape2)
library(zoo)
# create time series where each subject have an observation at each time step
d1 <- data.frame(subject = rep(letters[1:4], each = 5),
day = rep(1:5, 4),
quantity = sample(x = 1:4, size = 20, replace = TRUE))
d1
# select some random observations
d2 <- d1[sample(x = seq_len(nrow(d1)), size = 15), ]
d2
# reshape to wide format with dcast
# -> 'automatic' extension from irregular to regular series for each subject,
# _given_ that all time steps are represented.
# Alternative method below more explicit
# fill for structural missings defaults to NA
d3 <- dcast(d2, day ~ subject, value.var = "quantity")
d3
# convert to zoo time series
z1 <- zoo(x = d3[ , -1], order.by = d3$day)
################################
# alternative method to extend time series
# time steps to include are given explicitly
# create a zero-dimensional zoo series
z0 <- zoo(, min(d1$day):max(d1$day))
# extend z1 to contain the same time indices as z0
z1 <- merge(z1, z0)
################################
# lag, defaults to one unit
z2 <- lag(x = z1)
z2
# calculate rolling mean with window width 3
rollmeanr(x = z2, k = 3)
# Handling of NAs:
# from ?rollmean:
# "The default method of rollmean does not handle inputs that contain NAs.
# In such cases, use rollapply instead.":
rollapplyr(data = z2, width = 3, FUN = mean, na.rm = TRUE)
So, to answer my own question, one way to do it is through split-lapply(rollingaverage)-unlist:
Temp <-with(panelUNB, split(Quantity, Subject))
Temp <- lapply(Temp, FUN=function (x) rollapplyr(
x,2,align="right",fill=NA,na.rm=TRUE, FUN=mean))
QuantityMA <-unlist(Temp)
The "QuantityMA" vector would then have to be added back to the main "panelUNB" frame. Seems to be working. Lagging can be accomplished on an unbalanced panel with ddply.
If anyone has another, perhaps more elegant, solution, you're welcome to share it.