I need to reproduce with pandas what SQL does so easily:
select
del_month
, sum(case when off0_on1 = 1 then 1 else 0 end) as on1
, sum(case when off0_on1 = 0 then 1 else 0 end) as off0
from a1
group by del_month
order by del_month
Here is a sample, illustrative pandas dataframe to work on:
a1 = pd.DataFrame({'del_month':[1,1,1,1,2,2,2,2], 'off0_on1':[0,0,1,1,0,1,1,1]})
Here are my attempts to reproduce the above SQL with pandas. The first line works. The second line gives an error:
a1['on1'] = a1.groupby('del_month')['off0_on1'].transform(sum)
a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(sum(lambda x: 1 if x == 0 else 0))
Here's the second line's error:
TypeError: 'function' object is not iterable
This previous question of mine had a problem with the lambda function, which was solved. The bigger problem is how to reproduce SQL's "sum(case when)" logic on grouped data. I'm looking for a general solution, since I need to do this sort of thing often. The answers in my previous question suggested using map() inside the lambda function, but the following results for the "off0" column are not what I need. The "on1" column is what I want. The answer should be the same for the whole group (i.e. "del_month").
Using
get_dummies
would only need a singlegroupby
call, which is simpler.Additionally, for the case of aggregation, call
sum
directly instead of usingapply
:Simply sum the Trues in your conditional logic expressions:
Similarly, you can do the same in SQL if dialect supports it which most should:
And to replicate above SQL in pandas, don't use
transform
but send multiple aggregates in agroupby().apply()
call: