Resample in a rolling window using pandas

2019-05-03 11:23发布

问题:

Assume I have daily data (not regularly spaced), I want to compute for each month the moving standard deviation (or an arbitrarily non linear function) in the past 5 months. For example, for May 2012 I would compute the stddev from the period starting from Jan 2012 to May 2012 (5 months). For June 2012 the period starts in Feb 2012, etc. The final result is a time series with monthly values.

I cannot apply a rolling window because this would first be daily and secondly I need to specify the number of values (a rolling window does not aggregate by time frame, some posts addressed this issue but they are not relevant to my problem as the rolling would still be for each new day).

I cannot apply resampling, because then the sample would be every 5 months, e..g I would only have values for May 2012, Oct 2012, March 2013... Finally, as the function is not linear I cannot reconstruct it by first doing a monthly sample and then applying a 5 period rolling window on it.

So I would need a sort of resampling functionality applied to a rolling window defined by time interval (not number of values).

How can I do this in pandas? One approach could be to combine several (5 in this example) resampled (5 months) time series, each with one month of offset and then align all these series into one... but I do not know how to implement this.

回答1:

I had a similar issue dealing with a timedelta series where I wanted to take a moving average and then resample. Here is an example where I have 100 seconds of data. I take a rolling average of 10 second windows and then resample for every 5 seconds, taking the first entry in each resample bin. The result should be the previous 10 second average at 5 second increments. You could do something similar with month format instead of seconds:

df = pd.DataFrame(range(0,100), index=pd.TimedeltaIndex(range(0,100),'s'))
df.rolling('10s').mean().resample('5s').first()

Result:

             0
00:00:00   0.0
00:00:05   2.5
00:00:10   5.5
00:00:15  10.5
00:00:20  15.5
00:00:25  20.5
00:00:30  25.5
00:00:35  30.5
00:00:40  35.5
00:00:45  40.5
00:00:50  45.5
00:00:55  50.5
00:01:00  55.5
00:01:05  60.5
00:01:10  65.5
00:01:15  70.5
00:01:20  75.5
00:01:25  80.5
00:01:30  85.5
00:01:35  90.5


回答2:

Here's an attempt - not super clean, but it might work.

Dummy data:

df = pd.DataFrame(data={'a': 1.}, 
                  index=pd.date_range(start='2001-1-1', periods=1000))

First define a function to decrease a date n number of months. This needs to be cleaned up, but works for n<=12.

from datetime import datetime    
def decrease_month(date, n):
    assert(n <= 12)

    new_month = date.month - n
    year_offset = 0
    if new_month <= 0:
        year_offset = -1
        new_month = 12 + new_month

    return datetime(date.year + year_offset, new_month, 1)

Then, add 5 new columns for the 5 rolling periods that each date will cross.

for n in range(rolling_period):
    df['m_' + str(n)] = df.index.map(lambda x: decrease_month(x, n))

Then - use the melt function to convert the data from wide to long, so each rolling period will have one entry.

df_m = pd.melt(df, id_vars='a')

You should be able to groupby the newly created column, and each date will represent the right 5 month rolling period.

In [222]: df_m.groupby('value').sum()
Out[222]: 
              a
value          
2000-09-01   31
2000-10-01   59
2000-11-01   90
2000-12-01  120
2001-01-01  151
2001-02-01  150
2001-03-01  153
2001-04-01  153
2001-05-01  153
2001-06-01  153
2001-07-01  153
...


回答3:

I have solved a similar problem with the following code:

interval = 5
frames = []
for base in range(interval):
  frame = data.resample(f"{interval}min", base=base).last()
  frames.append(frame)

pd.concat(frames, axis=0).sort_index()

Here I create 5 data frames which are resampled at the same interval, but have different offsets (the base parameter). Then I just have to concatenate and sort them. Should usually be much more efficient than rolling + resampling (the only overhead is the sorting).