This question is an extension of a question I asked yesterday, but I will rephrase
Using a data frame and pandas, I am trying to figure out what the tip percentage is for each category in a group by.
So, using the tips database, I want to see, for each sex/smoker, what the tip percentage is is for female smoker / all female and for female non smoker / all female (and the same thing for men)
When I do this,
import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
I get the following:
total_bill tip
sex smoker
Female No 977.68 149.77
Yes 593.27 96.74
Male No 1919.75 302.00
Yes 1337.07 183.07
But I am looking for something more like this
Tip Pct
Female No 0.153189183
Yes 0.163062349
Male No 0.15731215
Yes 0.136918785
Where Tip Pct = sum(tip)/sum(total_bill) for each group
What am I doing wrong and how do I fix this? Thank you!
I understand that this would give me tip as a percentage of total tips:
(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x.sum()))
Is there a way to modify it to look at another column, i.e.
(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x['total_bill'].sum()))
Thanks!
You can use
apply
to loop through rows of the data frame (withaxis = 1
), where for each row you can access thetip
andtotal_bill
and divide them to get the percentage: