VaR calculation with complete missing column

2019-07-20 07:42发布

问题:

I need to calculate rolling VaR of stock returns. From this post: Using rollapply function for VaR calculation using R , I understand that columns having complete missing cases will give error. But since the starting date and end date of stock returns for various firms are different, it creates missing values when data is converted from long to wide format. Estimation can be done using only rows with no missing values but this leads to serious loss of data. Thus, is there any way to perform the calculation with columns having complete missing values and for the missing columns, getting an output 'NA'. This is what I did:

library(PerformanceAnalytics)
data(managers)
VaR(managers, p=.95, method="modified")

It performs the desired calculation, but when I tried this with first 60 rows with 'HAM6' column completely missing

managers2<-managers[1:60,]
VaR(managers2, p=.95, method="modified")

I get the following error:

Error in dimnames(cd) <- list(as.character(index(x)), colnames(x)) :
'dimnames' applied to non-array

I understand that the error is due the missing 'HAM6' column, but is there any way to retain the missing columns and get an output 'NA' for 'HAM6' rather than deleting 'HAM6' column? I have tried most to the methods available for handling missing values, but couldn't find any suitable solution. Any help is much appreciated.

回答1:

Use apply(managers,2,...) with checking if the whole column is NA as follows:

apply(managers2,2,function(x){
  if(!all(is.na(x))){
    return(as.numeric(VaR(x, p=.95, method="modified")))
  } else {
    return(NA)
  }
})

Result:

VaR calculation produces unreliable result (inverse risk) for column: 1 : -0.00354267287759942
       HAM1        HAM2        HAM3        HAM4        HAM5        HAM6 EDHEC LS EQ    SP500 TR   US 10Y TR    US 3m TR 
-0.03212244 -0.03698665 -0.04403660 -0.08093557 -0.12635656          NA -0.02275816 -0.06886077 -0.02510378          NA

The warning referrs to US 3m TR. This is the reason that there is an NA



回答2:

In addition to @Floo0's solution, as a workaround to this problem the missing values could be imputed by the mean return of the corresponding period. See (http://www.r-bloggers.com/missing-data-imputation/) for more information

require(PerformanceAnalytics)
data(managers)

managers.df=as.data.frame(managers)

dateidx = as.Date(index(managers))


#Compute mean Return for each period
MeanReturn_PerPeriod=rowMeans(managers.df,na.rm=TRUE)

#Create copy of dataset for new values
managers.df.new=managers.df

#Impute NA Values by average return for rest of the data
for(x in 1:ncol(managers.df.new)) {
 managers.df.new[,x][is.na(managers.df.new[,x])]=MeanReturn_PerPeriod[is.na(managers.df.new[,x])]
}

managers_imputed=xts(managers.df.new,order.by=dateidx)

#Test VaR calculation

managers2<-managers_imputed[1:60,]
VaR(managers2, p=.95, method="modified")
#VaR calculation produces unreliable result (inverse risk) for column: 10 : -0.00354267287759942
#           HAM1        HAM2       HAM3        HAM4        HAM5        HAM6 EDHEC LS EQ    SP500 TR   US 10Y TR
#VaR -0.03212244 -0.03491864 -0.0440366 -0.08093557 -0.02880137 -0.02696782 -0.02130781 -0.06886077 -0.02510378
#    US 3m TR
#VaR       NA