Undo a Series Diff

2020-06-21 07:16发布

问题:

I have a pandas Series with monthly data (df.sales). I needed to subtract the data 12 months earlier to fit a time series, so I ran this command:

sales_new = df.sales.diff(periods=12)

I then fit an ARMA model, and predicted the future:

model = ARMA(sales_new, order=(2,0)).fit()
model.predict('2015-01-01', '2017-01-01')

Because I had diffed the sales data, when I use the model to predict, it predicts forward diffs. If this was diff of period 1, I would just use an np.cumsum(), but because this is period 12, it makes it a bit tricker.

What is the best way to "unroll" the diff and turn it back into the scale of the original data?

回答1:

I think you need to calculate the future values off the values for the first 12 months:

periods = 12
df = pd.DataFrame(data={'value': np.random.random(size=24)}, index=pd.date_range(start=date(2014, 1,1), freq='M', periods=24))
diffs = df.diff(periods=periods)

restored = df.copy()
restored.iloc[periods:] = np.nan
for d, val in diffs.iloc[periods:].iterrows():
    restored.loc[d] = restored.loc[d - pd.DateOffset(months=periods)].value + val

res = pd.concat([df, diffs, restored], axis=1)
res.columns = ['original', 'diffs', 'restored']

            original     diffs  restored
2014-01-31  0.926367       NaN  0.926367
2014-02-28  0.688898       NaN  0.688898
2014-03-31  0.297025       NaN  0.297025
2014-04-30  0.139094       NaN  0.139094
2014-05-31  0.375082       NaN  0.375082
2014-06-30  0.490638       NaN  0.490638
2014-07-31  0.789683       NaN  0.789683
2014-08-31  0.236841       NaN  0.236841
2014-09-30  0.263245       NaN  0.263245
2014-10-31  0.547025       NaN  0.547025
2014-11-30  0.243444       NaN  0.243444
2014-12-31  0.385028       NaN  0.385028
2015-01-31  0.823224 -0.103142  0.823224
2015-02-28  0.828245  0.139347  0.828245
2015-03-31  0.753291  0.456266  0.753291
2015-04-30  0.447670  0.308576  0.447670
2015-05-31  0.936667  0.561584  0.936667
2015-06-30  0.223049 -0.267589  0.223049
2015-07-31  0.933942  0.144259  0.933942
2015-08-31  0.325726  0.088886  0.325726
2015-09-30  0.947526  0.684281  0.947526
2015-10-31  0.524749 -0.022276  0.524749
2015-11-30  0.431671  0.188227  0.431671
2015-12-31  0.234028 -0.151000  0.234028


回答2:

This should do it:

def rebuild_diffed(series, first_element_original):
    cumsum = series.cumsum()
    return cumsum.fillna(0) + first_element_original

Step by step version:

# making some data 
a = pd.Series([2, 6, 4, 6, 2,])
print(a)
a_diff = a.diff()
print(a_diff)

# Rebuilding  
a_diff_cumsum = a_diff.cumsum()
print(a_diff_cumsum)

rebuilt = a_diff_cumsum.fillna(0) + 2
print(rebuilt)

print(rebuilt == a)