R, Times Series, Arima Model, Forecasting, Daily d

2019-02-07 07:40发布

问题:

I am trying to do some demand forecasting with daily data, from jan 16, 2012 to Oct 10, 2013. But the forecasting just returns awful results. Any clue why?

This is how the data looks like in a plot: There are weekly and monthly seasonalities exist. Ie: More demand during weekday and less demand during the weekend.

Here is how the prediction plot looks: where the black line is the actual data and blue line is the predicted data.

 x = ts(data, freq=7, start=c(3,2))
 fit <- auto.arima(x)
 pred <- forecast(fit, h=300)

I did a lot of research on how to fit daily data with arima model. And since there are weekly seasonality, so I chose freq=7.

However, since the predicts are bad. And someone was nice enough to pointing out one of the method Professor Hyndman shared about fitting models with multiple seasonalities.

https://stats.stackexchange.com/questions/74418/frequency-of-time-series-in-r/74426#74426

So I took the good guys's advice and fit the models with the given 2 methods from the above link.

Method 1: Using tbats() function.

x_new <- msts(x, seasonal.periods=c(7,7*52))
fit <- tbats(x_new)
fc <- forecast(fit, h=7*52)

I used the weekly seasonality 7 and annually seasonality 7*52. Since I haven't figure out an easy way to get the monthly seasonality. Based on the result, the prediction is not good either. Note: if I use 7*4 as the second seasonal period for monthly, it gives worse prediction.

Method 2: Using fouriers as an xreg.

seas1 <- fourier(x, K=1)
seas2 <- fourier(ts(x,freq=7*52), K=1)
fit <- auto.arima(x, xreg=cbind(seas1,seas2))
seas1.f <- fourierf(x, K=1, h=7*52)
seas2.f <- fourierf(ts(x,freq=7*52), K=1, h=7*52)
fc1 <- forecast(fit, xreg=cbind(seas1.f, seas2.f))

I tried it with different K, and it doesn't improve the prediction.

Therefore, I am stuck! Since the forecasting is way off. Could anyone please point out where my mistakes are? Or how should I improve my model?

Many thanks!

回答1:

You are missing the impacts of holidays and the lead orlag impacts around the holiday and outliers(pulse outliers, level shift, changes in trend, changes in day of the week impacts(ie seasonal pulse)). If you deal with these things then you can't get a good read on the day of the week patterns. Can you post your data to dropbox.com so I can take a look? Specify the beginning date and the country where the data is from.