How to reference groupby index when using apply, t

2019-05-30 16:30发布

To be concrete, say we have two DataFrames:

df1:

    date    A
0   12/1/14 3
1   12/1/14 1
2   12/3/14 2
3   12/3/14 3
4   12/3/14 4
5   12/6/14 5

df2:

        B
12/1/14 10
12/2/14 20
12/3/14 10
12/4/14 30
12/5/14 10
12/6/14 20

Now I want to groupby date in df1, and take a sum of value A in each group and then normalize it by the value of B in df2 in the corresponding date. Something like this

df1.groupby('date').agg(lambda x: np.sum(x)/df2.loc[x.date,'B'])

The question is that neither aggregate, apply, nor transform can reference to the index. Any idea how to work around this?

2条回答
Animai°情兽
2楼-- · 2019-05-30 17:23
> df_grouped = df1.groupby('date').sum()
> print df_grouped['A] /df2['B'].astype(float)
date
12/1/14    0.40
12/2/14     NaN
12/3/14    0.90
12/4/14     NaN
12/5/14     NaN
12/6/14    0.25
dtype: float64
查看更多
Bombasti
3楼-- · 2019-05-30 17:25

When you call .groupby('column') it makes column to be part of DataFrameGroupBy index. And it is accessible through .index property.

So, in your case, assuming that date is NOT part of index in either df this should work:

def f(x):
    return x.sum() / df2.set_index('date').loc[x.index[0], 'B']

df1.set_index('date').groupby(level='date').apply(f)

This produces:

               A
date            
2014-01-12  0.40
2014-03-12  0.90
2014-06-12  0.25

If date is in index of df2 - just use df2.loc[x.index[0], 'B'] in the above code.

If date is in df1.index change the last line to df1.groupby(level='date').apply(f).

查看更多
登录 后发表回答