Python Pandas - turn absolute periods into relativ

2019-08-06 12:37发布

I have a dataframe that I want to use to calculate rolling sums relative to an event date. The event date is different for each column and is represented by the latest date in which there is a value in each column.

Here is a toy example:

rng = pd.date_range('1/1/2011', periods=8, freq='D')
df = pd.DataFrame({
            '1' : [56, 2, 3, 4, 5, None, None, None],
            '2' : [51, 2, 3, 4, 5, 6, None, None],
            '3' : [51, 2, 3, 4, 5, 6, 0, None]}, index = rng)

pd.rolling_sum(df,3)

The dataframe it produces looks like this:

            1       2       3
2011-01-01  NaN     NaN     NaN
2011-01-02  NaN     NaN     NaN
2011-01-03  61      56      56
2011-01-04  9       9       9
2011-01-05  12      12      12
2011-01-06  NaN     15      15
2011-01-07  NaN     NaN     11
2011-01-08  NaN     NaN     NaN

I now want to align the last event dates on the final row of the dataframe and set the index to 0 with each preceding row index -1,-2,-3 and so on. The periods no longer being absolute but relative to the event date.

The desired dataframe would look like this:

    1   2   3
-7.00   NaN NaN NaN
-6.00   NaN NaN NaN
-5.00   NaN NaN NaN
-4.00   NaN NaN 56
-3.00   NaN 56  9
-2.00   61  9   12
-1.00   9   12  15
0.00    12  15  11

Thanks for any guidance.

1条回答
Melony?
2楼-- · 2019-08-06 13:12

I don't see any easy ways to do this. The following will work, but a bit messy.

In [37]: def f(x):
   ....:     y = x.dropna()
   ....:     return Series(y.values,x.index[len(x)-len(y):])
   ....: 

In [40]: roller = pd.rolling_sum(df,3).reset_index(drop=True)

In [41]: roller
Out[41]: 
    1   2   3
0 NaN NaN NaN
1 NaN NaN NaN
2  61  56  56
3   9   9   9
4  12  12  12
5 NaN  15  15
6 NaN NaN  11
7 NaN NaN NaN

[8 rows x 3 columns]

In [43]: roller.apply(f).reindex_like(roller)
Out[43]: 
    1   2   3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN  56
4 NaN  56   9
5  61   9  12
6   9  12  15
7  12  15  11

[8 rows x 3 columns]

In [44]: result = roller.apply(f).reindex_like(roller)

In [49]: result.index = result.index.values-len(result.index)+1

In [50]: result
Out[50]: 
     1   2   3
-7 NaN NaN NaN
-6 NaN NaN NaN
-5 NaN NaN NaN
-4 NaN NaN  56
-3 NaN  56   9
-2  61   9  12
-1   9  12  15
 0  12  15  11

[8 rows x 3 columns]
查看更多
登录 后发表回答