Pandas Grouping - Values as Percent of Grouped Tot

2019-07-24 17:45发布

This question is an extension of a question I asked yesterday, but I will rephrase

Using a data frame and pandas, I am trying to figure out what the tip percentage is for each category in a group by.

So, using the tips database, I want to see, for each sex/smoker, what the tip percentage is is for female smoker / all female and for female non smoker / all female (and the same thing for men)

When I do this,

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()

I get the following:

        total_bill  tip
sex smoker      
Female  No  977.68  149.77
        Yes 593.27  96.74
Male    No  1919.75 302.00
        Yes 1337.07 183.07

But I am looking for something more like this

        Tip Pct
Female  No  0.153189183
        Yes 0.163062349
Male    No  0.15731215
        Yes 0.136918785

Where Tip Pct = sum(tip)/sum(total_bill) for each group

What am I doing wrong and how do I fix this? Thank you!

I understand that this would give me tip as a percentage of total tips:

(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x.sum()))

Is there a way to modify it to look at another column, i.e.

(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x['total_bill'].sum()))

Thanks!

1条回答
▲ chillily
2楼-- · 2019-07-24 18:27

You can use apply to loop through rows of the data frame (with axis = 1), where for each row you can access the tip and total_bill and divide them to get the percentage:

(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
   .apply(lambda r: r.tip/r.total_bill, axis = 1))

#sex     smoker
#Female  No        0.153189
#        Yes       0.163062
#Male    No        0.157312
#        Yes       0.136919
#dtype: float64
查看更多
登录 后发表回答