Often when we perform groupby
operations using pandas we may wish to apply several functions across multiple series.
groupby.agg
seems the natural way to perform these groupings and calculations.
However, there seems to be discrepancy between how groupby.agg
and groupby.apply
are implemented, because I cannot group to a list using agg
. Tuple and set works fine, which suggests to me you can only aggregate to immutable types via agg
. Via groupby.apply
, I can aggregate one series to a list directly with no issues.
Below is a complete example. Functions (1), (2), (3) complete successfully. (4) comes back with # ValueError: Function does not reduce
.
import pandas as pd
df = pd.DataFrame([['Bob', '1/1/18', 'AType', 'blah', 'test', 'test2'],
['Bob', '1/1/18', 'AType', 'blah2', 'test', 'test3'],
['Bob', '1/1/18', 'BType', 'blah', 'test', 'test2']],
columns=['NAME', 'DATE', 'TYPE', 'VALUE A', 'VALUE B', 'VALUE C'])
def grouper(df, func):
f = {'VALUE A': lambda x: func(x), 'VALUE B': 'last', 'VALUE C': 'last'}
return df.groupby(['NAME', 'DATE', 'TYPE'])['VALUE A', 'VALUE B', 'VALUE C']\
.agg(f).reset_index()
# (1) SUCCESS
grouper(df, set)
# (2) SUCCESS
grouper(df, tuple)
# (3) SUCCESS
df.groupby(['NAME', 'DATE', 'TYPE', 'VALUE B', 'VALUE C'])['VALUE A']\
.apply(list).reset_index()
# (4) FAIL
grouper(df, list)
# AttributeError
# ValueError: Function does not reduce