I aggregate my Pandas dataframe: data
. Specifically, I want to get the average and sum amount
s by tuples of [origin
and type
]. For averaging and summing I tried the numpy functions below:
import numpy as np
import pandas as pd
result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()
My issue is that the amount
column includes NaN
s, which causes the result
of the above code to have a lot of NaN
average and sums.
I know both pd.Series.sum
and pd.Series.mean
have skipna=True
by default, so why am I still getting NaN
s here?
I also tried this, which obviously did not work:
data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()
EDIT:
Upon @Korem's suggestion, I also tried to use a partial
as below:
s_na_mean = partial(pd.Series.mean, skipna = True)
data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()
but get this error:
error: 'functools.partial' object has no attribute '__name__'