Timeseries Crossvalidation in R: using tsCV() with

2019-05-31 21:58发布

问题:

I am currently trying to evaluate a tslm-model using timeseries cross validation. I want to use a fixed model (without parameter reestimation) an look at the 1 to 3 step ahead horizon forecasts for the evaluation period of the last year.

I have trouble to get tsCV and tslm from the forecast-library to work well together. What am I missing?

library(forecast)
library(ggfortify)

AirPassengers_train <- head(AirPassengers, 100)
AirPassengers_test  <- tail(AirPassengers, 44)

## Holdout Evaluation
n_train <- length(AirPassengers_train)
n_test  <- length(AirPassengers_test)
pred_train <- ts(rnorm(n_train))
pred_test  <- ts(rnorm(n_test))

fit <- tslm(AirPassengers_train ~ trend + pred_train)

forecast(fit, newdata = data.frame(pred_train = pred_test)) %>% 
  accuracy(AirPassengers_test)
#>                        ME     RMSE      MAE       MPE     MAPE     MASE
#> Training set 1.135819e-15 30.03715 23.41818 -1.304311 10.89785 0.798141
#> Test set     3.681350e+01 76.39219 55.35298  6.513998 11.96379 1.886546
#>                   ACF1 Theil's U
#> Training set 0.6997632        NA
#> Test set     0.7287923  1.412804


## tsCV Evaluation
fc_reg <- function(x) forecast(x, newdata = data.frame(pred_train = pred_test),
                               h = h, model = fit)

tsCV(AirPassengers_test, fc_reg, h = 1)
#>      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 1957                  NA  NA  NA  NA  NA  NA  NA  NA
#> 1958  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
#> 1959  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
#> 1960  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA


forecast(AirPassengers_test, newdata = data.frame(pred_train = pred_test),
         h = 1, model = fit)
#> Error in forecast.ts(AirPassengers_test, newdata = data.frame(pred_train = pred_test),
#> : Unknown model class

I have a feeling, that https://gist.github.com/robjhyndman/d9eb5568a78dbc79f7acc49e22553e96 is relevant. How would I apply it to the scenario above?

回答1:

For time series cross-validation, you should be fitting a separate model to every training set, not passing an existing model. With predictor variables, the function needs to be able to grab the relevant elements when fitting each model, and other elements when producing forecasts.

The following will work.

fc <- function(y, h, xreg)
{
  if(NROW(xreg) < length(y) + h)
    stop("Not enough xreg data for forecasting")
  X <- xreg[seq_along(y),]
  fit <- tslm(y ~ X)
  X <- xreg[length(y)+seq(h),]
  forecast(fit, newdata=X)
}

# Predictors of the same length as the data
# and with the same time series characteristics.    
pred <- ts(rnorm(length(AirPassengers)), start=start(AirPassengers),
           frequency=frequency(AirPassengers))

# Now pass the whole time series and the corresponding predictors 
tsCV(AirPassengers, fc, xreg=pred)

If you have more than one predictor variable, then xreg should be a matrix.



回答2:

I ended up using a function to forecast a trend. I'm not sure if this is correctly specified but the rmse looks about right.

flm <- function(y, h) { forecast(tslm(y ~ trend, lambda=0), h=h) }

e <- tsCV(tsDF, flm, h=6)
sqrt(mean(e^2, na.rm=TRUE))

@robhyndman