I have the following sample data:
library(data.table)
dt <- data.table('time' = c(1:10),
'units'= c(89496264,81820040,80960072,109164545,96226255,96270421,95694992,117509717,105134778,0))
I would like to make a forecast
for the units
at time = 10
.
I can see that at time = 4*k
, where k = 1,2,...
there is a big increase of units, and I would like to include that as a seasonality factor.
How could I do that in R
? I have looked into the auto.arima
but it seems that is it not the way to go.
Thanks
The Prophet API lets you compute easily the predictions, with an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality.
Quote from the link above:
It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
install.packages(‘prophet’)
library(prophet)
model <- prophet(dt) # see ?prophet for details, this builds the model (like auto.arima)
future <- make_future_dataframe(model, periods = 10) # creates the "future" data
forecast <- predict(model, future) # predictions
tail(forecast)
Here the complete Example in R.
You are right, you can bet at 98.4% that there is a seasonality for t=4*k, and it value is +21108156. If the seasonality is assumed multiplicative rather than additive, you can get at 98.5%, that there is a seasonality and its value is +18.7%.
This is how I proceed, without using ready made package so that you can ask all kind of similar questions.
First introduce a new boolean variable dt$season = (dt$time %% 4)==0
, which is true (i.e =1) for t=0,4,8,... and false (i.e. =0) else where. Then the function x~a*season+b
is equal to a+b
for t=0,4,8,... and b
else where. In other words, a
is the difference between the seasonal effect and the non-seasonal effect.
The linear regression fit <- lm(units ~ season, data= dt)
, gives you a=21108156
, and summary(fit)
tells you the std-error an a
is 6697979, so that the observed value a=21108156
has a probability less than 0.0161 to appear in case it were 0. So, you can reasonably bet there is a 4 cycle seasonality with more than 1-0.0161=98.388% chances to be right.
If you assume the seasonality is multiplicative, use the same reasoning with the variable dt$mult = dt$units * dt$season
. This time a * dt$mult + b
is equal to a * dt$units + b
when the seasonality apply and to b
when it does not. So the seasonality brings a difference of a * dt$units
, that is multiply the average by a=.1877=18.77%
, with a significativity of 0.01471=1-98.5%
.
That's how ready made packages works.