Python pandas equivalent to R groupby mutate

So in R when I have a data frame consisting of say 4 columns, call it df and I want to compute the ratio by sum product of a group, I can it in such a way:

// generate data
df = data.frame(a=c(1,1,0,1,0),b=c(1,0,0,1,0),c=c(10,5,1,5,10),d=c(3,1,2,1,2));
| a   b   c    d |
| 1   1   10   3 |
| 1   0   5    1 |
| 0   0   1    2 |
| 1   1   5    1 |
| 0   0   10   2 |
// compute sum product ratio
df = df%>% group_by(a,b) %>%
      mutate(
          ratio=c/sum(c*d)
      );
| a   b   c    d  ratio |
| 1   1   10   3  0.286 |
| 1   1   5    1  0.143 |
| 1   0   5    1  1     |
| 0   0   1    2  0.045 |
| 0   0   10   2  0.454 |

But in python I need to resort to loops. I know there should be a more elegant way than raw loops in python, anyone got any ideas?

标签： python r pandas dplyr

2条回答

等我变得足够好

2楼-- · 2020-05-19 06:15

According to this thread on pandas github we can use the transform() method to replicate the combination of dplyr::groupby() and dplyr::mutate(). For this example, it would look as follows:

df = pd.DataFrame(
    dict(
        a=(1 , 1, 0, 1, 0 ), 
        b=(1 , 0, 0, 1, 0 ),
        c=(10, 5, 1, 5, 10),
        d=(3 , 1, 2, 1, 2 ),
    )
).assign(
    prod_c_d = lambda x: x['c'] * x['d'], 
    ratio    = lambda x: x['c'] / (x.groupby(['a','b']).transform('sum')['prod_c_d'])
)

This example uses pandas method chaining. For more information on how to use method chaining to replicate dplyr workflows see this blogpost.

The method using apply() and groupby() does not work for me because it does not seem to be adaptable. For example, it does not work if we delete g.c/ from the lambda expression.

df['ratio'] = df.groupby(['a','b'], group_keys=False)\
    .apply(lambda g: (g.c * g.d).sum() )

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2020-05-19 06:22

It can be done with similar syntax with groupby() and apply():

df['ratio'] = df.groupby(['a','b'], group_keys=False).apply(lambda g: g.c/(g.c * g.d).sum())

0人赞添加讨论(0) 举报

Python pandas equivalent to R groupby mutate

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间