Calculating year over year growth by group in Pand

2020-06-23 02:20发布

问题:

I have the following dataframe:

In [1]: df
Out[1]: 

ID            Month           Transaction_Amount

1             2013/01         10
1             2013/02         20
1             2013/03         10
1             2013/04         20
1             2013/05         10
1             2013/06         20
1             2013/07         10
1             2013/08         20
1             2013/09         10
1             2013/10         20
1             2013/11         10
1             2013/12         20
1             2014/01         15
1             2014/02         25
1             2014/03         15
1             2014/04         25
...
1             2014/11         15
1             2014/12         25
...
10000000      2014/11         13
10000000      2014/12         23

What I would like to do is calculate the growth over rolling month periods year over year, so for example, I would want to find the value of (2014/01 - 2013/01) / (2014/01) which is (15 - 10) / (15) = 1/3 and save this for the first rolling period. There will be a total of 12 rolling periods for each ID. I'm thinking that the final output should look like:

In [2]: df_new
Out[2]: 

ID       rolling_period_1   rolling_period_2  ... rolling_period_12

1        .333333            .25                   .25
2        x1                 x2                    x12
3        y1                 y2                    y12
4        z1                 z2                    z12
...

I generated a list containing tuples of every year over period [(2013/01, 2014/01), (2013/02, 2014/02) ... (2013/12, 2014/12)] and have been playing around with isin to index a subset of the original df, but I am unsure how to arrive at the df_new.

EDIT

I have created a new dataframe called temp_df with the following code:

In [4]: temp_df = df[df['month'].isin(('2013/01','2014/01'))]

In [5]: temp_df
Out[5]:

ID            Month           Transaction_Amount

1             2013/01         10
1             2014/01         15
2             2013/01         20
2             2014/01         30
3             2013/01         15
3             2014/01         30
...

What I would like to produce is a DataFrame that looks like the following:

In [6]: new_df
Out[6]:

ID            Transaction_Growth

1             .3333   # (15-10)/15
2             .3333   # (30-20)/30
3             .50     # (30-15)/30
...

回答1:

you can use shift to offset the rows in the dataframe.

Create dummy data with month column and values column

rng = pd.date_range('1/1/2011', periods=90, freq='M')
df = pd.DataFrame({'value':range(1,91),'date':rng})

set the month column to index

df = df.set_index('date')

shift a copy of the dataframe 12 periods to get the value 12 months ago, minus from the current record, and divide by current record:

df - df.shift(12)/ df

update the sign on the period in the shift function

Updated to consider ID

# Create range of months
rng = pd.date_range('1/1/2011', periods=180, freq='M')
ID = np.array([1,2,3])

# Create ID column
ID = np.repeat(ID,60)

# Create dummy data in dataframe
df = pd.DataFrame({'ID':ID,'value':range(1,181),'date':rng})

# Use shift on a group by object
(df.value - df.groupby(['ID']).value.shift(12))/ df.value


回答2:

I think there is a much more simple way to do this that doesn't require keeping track of shifting time periods, try using the df.pct_change() method:

import pandas as pd
import numpy as np
date_range = pd.period_range("2016-01", "2018-01",freq='m')
df= pd.DataFrame({'A':np.random.rand(len(date_range))}, index=date_range)
df['pct_pop'] = df['A'].pct_change()
df['pct_yoy'] = df['A'].pct_change(12)
df

               A    pct_pop     pct_yoy
2016-01 0.478381    NaN NaN
2016-02 0.941450    0.967991    NaN
2016-03 0.128445    -0.863567   NaN
2016-04 0.498623    2.882011    NaN
2016-05 0.914663    0.834377    NaN
2016-06 0.349565    -0.617821   NaN
2016-07 0.563296    0.611419    NaN
2016-08 0.144055    -0.744264   NaN
2016-09 0.502279    2.486708    NaN
2016-10 0.621283    0.236928    NaN
2016-11 0.716813    0.153763    NaN
2016-12 0.152372    -0.787431   NaN
2017-01 0.160636    0.054234    -0.664209
2017-02 0.496759    2.092453    -0.472347
2017-03 0.324318    -0.347132   1.524965
2017-04 0.431651    0.330949    -0.134315
2017-05 0.973095    1.254357    0.063884
2017-06 0.007917    -0.991864   -0.977351
2017-07 0.875365    109.562870  0.554005
2017-08 0.860987    -0.016425   4.976784
2017-09 0.099549    -0.884378   -0.801805
2017-10 0.544275    4.467398    -0.123950
2017-11 0.433326    -0.203846   -0.395482
2017-12 0.688057    0.587850    3.515636
2018-01 0.924038    0.342967    4.752374