I need to calculate rolling VaR of stock returns. From this post: Using rollapply function for VaR calculation using R , I understand that columns having complete missing cases will give error. But since the starting date and end date of stock returns for various firms are different, it creates missing values when data is converted from long to wide format. Estimation can be done using only rows with no missing values but this leads to serious loss of data. Thus, is there any way to perform the calculation with columns having complete missing values and for the missing columns, getting an output 'NA'. This is what I did:
library(PerformanceAnalytics)
data(managers)
VaR(managers, p=.95, method="modified")
It performs the desired calculation, but when I tried this with first 60 rows with 'HAM6' column completely missing
managers2<-managers[1:60,]
VaR(managers2, p=.95, method="modified")
I get the following error:
Error in dimnames(cd) <- list(as.character(index(x)), colnames(x)) :
'dimnames' applied to non-array
I understand that the error is due the missing 'HAM6' column, but is there any way to retain the missing columns and get an output 'NA' for 'HAM6' rather than deleting 'HAM6' column? I have tried most to the methods available for handling missing values, but couldn't find any suitable solution. Any help is much appreciated.
Use apply(managers,2,...)
with checking if the whole column is NA
as follows:
apply(managers2,2,function(x){
if(!all(is.na(x))){
return(as.numeric(VaR(x, p=.95, method="modified")))
} else {
return(NA)
}
})
Result:
VaR calculation produces unreliable result (inverse risk) for column: 1 : -0.00354267287759942
HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC LS EQ SP500 TR US 10Y TR US 3m TR
-0.03212244 -0.03698665 -0.04403660 -0.08093557 -0.12635656 NA -0.02275816 -0.06886077 -0.02510378 NA
The warning referrs to US 3m TR
. This is the reason that there is an NA
In addition to @Floo0's solution, as a workaround to this problem the missing values could be imputed by the mean return of the corresponding period.
See (http://www.r-bloggers.com/missing-data-imputation/) for more information
require(PerformanceAnalytics)
data(managers)
managers.df=as.data.frame(managers)
dateidx = as.Date(index(managers))
#Compute mean Return for each period
MeanReturn_PerPeriod=rowMeans(managers.df,na.rm=TRUE)
#Create copy of dataset for new values
managers.df.new=managers.df
#Impute NA Values by average return for rest of the data
for(x in 1:ncol(managers.df.new)) {
managers.df.new[,x][is.na(managers.df.new[,x])]=MeanReturn_PerPeriod[is.na(managers.df.new[,x])]
}
managers_imputed=xts(managers.df.new,order.by=dateidx)
#Test VaR calculation
managers2<-managers_imputed[1:60,]
VaR(managers2, p=.95, method="modified")
#VaR calculation produces unreliable result (inverse risk) for column: 10 : -0.00354267287759942
# HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC LS EQ SP500 TR US 10Y TR
#VaR -0.03212244 -0.03491864 -0.0440366 -0.08093557 -0.02880137 -0.02696782 -0.02130781 -0.06886077 -0.02510378
# US 3m TR
#VaR NA