Pandas groupby sum

I have a dataframe as follows:

ref, type, amount
001, foo, 10
001, foo, 5
001, bar, 50
001, bar, 5
001, test, 100
001, test, 90
002, foo, 20
002, foo, 35
002, bar, 75
002, bar, 80
002, test, 150
002, test, 110

This is what I'm trying to get:

ref, type, amount, foo, bar, test
001, foo, 10, 15, 55, 190
001, foo, 5, 15, 55, 190
001, bar, 50, 15, 55, 190
001, bar, 5, 15, 55, 190
001, test, 100, 15, 55, 190
001, test, 90, 15, 55, 190
002, foo, 20, 55, 155, 260
002, foo, 35, 55, 155, 260
002, bar, 75, 55, 155, 260
002, bar, 80, 55, 155, 260
002, test, 150, 55, 155, 260
002, test, 110, 55, 155, 260

So I have this:

df.groupby('ref')['amount'].transform(sum)

But how can I filter it such that the above only applies to rows where type=foo or bar or test?

标签： python pandas merge group-by sum

2条回答

Lonely孤独者°

2楼-- · 2019-09-03 16:06

I think you need groupby with unstack and then merge to original DataFrame:

df1 = df.groupby(['ref','type'])['amount'].sum().unstack().reset_index()
print (df1)
type  ref  bar  foo  test
0     001   55   15   190
1     002  155   55   260

df = pd.merge(df, df1, on='ref')
print (df)
    ref  type  amount  sums  bar  foo  test
0   001   foo      10    15   55   15   190
1   001   foo       5    15   55   15   190
2   001   bar      50    55   55   15   190
3   001   bar       5    55   55   15   190
4   001  test     100   190   55   15   190
5   001  test      90   190   55   15   190
6   002   foo      20    55  155   55   260
7   002   foo      35    55  155   55   260
8   002   bar      75   155  155   55   260
9   002   bar      80   155  155   55   260
10  002  test     150   260  155   55   260
11  002  test     110   260  155   55   260

Timings:

In [506]: %timeit (pd.merge(df, df.groupby(['ref','type'])['amount'].sum().unstack().reset_index(), on='ref'))
100 loops, best of 3: 3.4 ms per loop

In [507]: %timeit (pd.merge(df, pd.pivot_table(df, values='amount', index=['ref'], columns=['type'], aggfunc=np.sum), left_on='ref', right_index=True))
100 loops, best of 3: 4.99 ms per loop

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2019-09-03 16:15

A solution using pivot table :

>>> b = pd.pivot_table(df, values='amount', index=['ref'], columns=['type'], aggfunc=np.sum)
>>> b
type  bar  foo  test
ref
1      55   15   190
2     155   55   260

>>> pd.merge(df, b, left_on='ref', right_index=True)
    ref  type  amount  bar  foo  test
0     1   foo      10   55   15   190
1     1   foo       5   55   15   190
2     1   bar      50   55   15   190
3     1   bar       5   55   15   190
4     1  test     100   55   15   190
5     1  test      90   55   15   190
6     2   foo      20  155   55   260
7     2   foo      35  155   55   260
8     2   bar      75  155   55   260
9     2   bar      80  155   55   260
10    2  test     150  155   55   260
11    2  test     110  155   55   260

0人赞添加讨论(0) 举报

Pandas groupby sum

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间