To be concrete, say we have two DataFrames:
df1:
date A
0 12/1/14 3
1 12/1/14 1
2 12/3/14 2
3 12/3/14 3
4 12/3/14 4
5 12/6/14 5
df2:
B
12/1/14 10
12/2/14 20
12/3/14 10
12/4/14 30
12/5/14 10
12/6/14 20
Now I want to groupby date in df1, and take a sum of value A in each group and then normalize it by the value of B in df2 in the corresponding date. Something like this
df1.groupby('date').agg(lambda x: np.sum(x)/df2.loc[x.date,'B'])
The question is that neither aggregate, apply, nor transform can reference to the index. Any idea how to work around this?
When you call
.groupby('column')
it makescolumn
to be part ofDataFrameGroupBy
index. And it is accessible through.index
property.So, in your case, assuming that
date
is NOT part of index in eitherdf
this should work:This produces:
If
date
is in index of df2 - just usedf2.loc[x.index[0], 'B']
in the above code.If
date
is indf1.index
change the last line todf1.groupby(level='date').apply(f)
.