Example dataframe:
rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['group1', 'group2', 'group3'] * 2,
'B': rand.rand(6),
'C': rand.rand(6),
'D': rand.rand(6)})
print df
A B C D
0 group1 0.417022 0.186260 0.204452
1 group2 0.720324 0.345561 0.878117
2 group3 0.000114 0.396767 0.027388
3 group1 0.302333 0.538817 0.670468
4 group2 0.146756 0.419195 0.417305
5 group3 0.092339 0.685220 0.558690
Groupby column A
group = df.groupby('A')
Use agg to return max value for each group
max1 = group['B'].agg({'max' : np.max})
print max1
max
A
group1 0.417022
group2 0.720324
group3 0.092339
But I would like to retain (or get back) the appropriate data in the other columns, C and D. This would be the remaining data for the row which contained the max value. So, the return should be:
A B C D
group1 0.417022 0.186260 0.204452
group2 0.720324 0.345561 0.878117
group3 0.092339 0.685220 0.558690
Can anybody show how to do this? Any help appreciated.
Two stages: first find indices, then lookup all the rows.
My answer is similar to FooBar but is done in one line by using idmax()
Result is the same: