I have a dataset with "Time, Region, Sales" variables and I want to forecast sales for each region using ARIMA or ETS(SES) using library(forecast)
. There are a total of 70 regions and all of them have 152 observations each and (3 years of data). Something like this:
Week Region Sales
01/1/2011 A 129
07/1/2011 A 140
14/1/2011 A 133
21/1/2011 A 189
... ... ...
01/12/2013 Z 324
07/12/2013 Z 210
14/12/2013 Z 155
21/12/2013 Z 386
28/12/2013 Z 266
So, I want R to treat every region as a different dataset and perform an auto.arima
. I am guessing a for loop should be an ideal fit here but I miserably failed with it.
What I would ideally want it to do is a for loop to run something like this (an auto arima for every 152 observations):
fit.A <- auto.arima(data$Sales[1:152])
fit.B <- auto.arima(data$Sales[153:304])
....
fit.Z <- auto.arima(data$Sales[10490:10640])
I came across this but while converting the dataframe into timeseries, all I got is NAs.
Any help is appreciated! Thank you.
Try the very efficient data.table
package (assuming your data set called temp
)
library(data.table)
library(forecast)
temp <- setDT(temp)[, list(AR = list(auto.arima(Sales))), by = Region]
The last step will save your results in temp
in a list
formats (as this is the only format you can store this type of an object).
Afterwords you can do any operation you want on these lists, for example, Inspecting them:
temp$AR
#[[1]]
# Series: Sales
# ARIMA(0,0,0) with non-zero mean
#
# Coefficients:
# intercept
# 147.7500
# s.e. 12.0697
#
# sigma^2 estimated as 582.7: log likelihood=-18.41
# AIC=40.82 AICc=52.82 BIC=39.59
#
#[[2]]
# Series: Sales
# ARIMA(0,0,0) with non-zero mean
#
# Coefficients:
# intercept
# 268.2000
# s.e. 36.4404
#
# sigma^2 estimated as 6639: log likelihood=-29.1
# AIC=62.19 AICc=68.19 BIC=61.41
Or plot the forecasts (and etc.)
temp[, sapply(AR, function(x) plot(forecast(x, 10)))]
You can do this easily with dplyr
. Assuming your data frame is named df
, run:
library(dplyr)
library(forecast)
model_fits <- group_by(df, Region) %>% do(fit=auto.arima(.$Sales))
The result is a data frame containing the model fits for each region:
> head(model_fits)
Source: local data frame [6 x 2]
Groups: <by row>
Region fit
1 A <S3:Arima>
2 B <S3:Arima>
3 C <S3:Arima>
4 D <S3:Arima>
5 E <S3:Arima>
6 F <S3:Arima>
You can get a list with each model fit like so:
> model_fits$fit
[[1]]
Series: .$Sales
ARIMA(0,0,0) with non-zero mean
Coefficients:
intercept
196.0000
s.e. 14.4486
sigma^2 estimated as 2088: log likelihood=-52.41
AIC=108.82 AICc=110.53 BIC=109.42
[[2]]
Series: .$Sales
ARIMA(0,0,0) with non-zero mean
Coefficients:
intercept
179.2000
s.e. 14.3561
sigma^2 estimated as 2061: log likelihood=-52.34
AIC=108.69 AICc=110.4 BIC=109.29