I have a pandas Series with monthly data (df.sales
). I needed to subtract the data 12 months earlier to fit a time series, so I ran this command:
sales_new = df.sales.diff(periods=12)
I then fit an ARMA model, and predicted the future:
model = ARMA(sales_new, order=(2,0)).fit()
model.predict('2015-01-01', '2017-01-01')
Because I had diffed the sales data, when I use the model to predict, it predicts forward diffs. If this was diff of period 1, I would just use an np.cumsum()
, but because this is period 12, it makes it a bit tricker.
What is the best way to "unroll" the diff and turn it back into the scale of the original data?
I think you need to calculate the future values off the values for the first 12 months:
periods = 12
df = pd.DataFrame(data={'value': np.random.random(size=24)}, index=pd.date_range(start=date(2014, 1,1), freq='M', periods=24))
diffs = df.diff(periods=periods)
restored = df.copy()
restored.iloc[periods:] = np.nan
for d, val in diffs.iloc[periods:].iterrows():
restored.loc[d] = restored.loc[d - pd.DateOffset(months=periods)].value + val
res = pd.concat([df, diffs, restored], axis=1)
res.columns = ['original', 'diffs', 'restored']
original diffs restored
2014-01-31 0.926367 NaN 0.926367
2014-02-28 0.688898 NaN 0.688898
2014-03-31 0.297025 NaN 0.297025
2014-04-30 0.139094 NaN 0.139094
2014-05-31 0.375082 NaN 0.375082
2014-06-30 0.490638 NaN 0.490638
2014-07-31 0.789683 NaN 0.789683
2014-08-31 0.236841 NaN 0.236841
2014-09-30 0.263245 NaN 0.263245
2014-10-31 0.547025 NaN 0.547025
2014-11-30 0.243444 NaN 0.243444
2014-12-31 0.385028 NaN 0.385028
2015-01-31 0.823224 -0.103142 0.823224
2015-02-28 0.828245 0.139347 0.828245
2015-03-31 0.753291 0.456266 0.753291
2015-04-30 0.447670 0.308576 0.447670
2015-05-31 0.936667 0.561584 0.936667
2015-06-30 0.223049 -0.267589 0.223049
2015-07-31 0.933942 0.144259 0.933942
2015-08-31 0.325726 0.088886 0.325726
2015-09-30 0.947526 0.684281 0.947526
2015-10-31 0.524749 -0.022276 0.524749
2015-11-30 0.431671 0.188227 0.431671
2015-12-31 0.234028 -0.151000 0.234028
This should do it:
def rebuild_diffed(series, first_element_original):
cumsum = series.cumsum()
return cumsum.fillna(0) + first_element_original
Step by step version:
# making some data
a = pd.Series([2, 6, 4, 6, 2,])
print(a)
a_diff = a.diff()
print(a_diff)
# Rebuilding
a_diff_cumsum = a_diff.cumsum()
print(a_diff_cumsum)
rebuilt = a_diff_cumsum.fillna(0) + 2
print(rebuilt)
print(rebuilt == a)