Transform vs. aggregate in Pandas

2020-02-08 02:45发布

When grouping a Pandas DataFrame, when should I use transform and when should I use aggregate? How do they differ with respect to their application in practice and which one do you consider more important?

1条回答
我命由我不由天
2楼-- · 2020-02-08 03:28

consider the dataframe df

df = pd.DataFrame(dict(A=list('aabb'), B=[1, 2, 3, 4], C=[0, 9, 0, 9]))

enter image description here


groupby is the standard use aggregater

df.groupby('A').mean()

enter image description here


maybe you want these values broadcast across the whole group and return something with the same index as what you started with.
use transform

df.groupby('A').transform('mean')

enter image description here

df.set_index('A').groupby(level='A').transform('mean')

enter image description here


agg is used when you have specific things you want to run for different columns or more than one thing run on the same column.

df.groupby('A').agg(['mean', 'std'])

enter image description here

df.groupby('A').agg(dict(B='sum', C=['mean', 'prod']))

enter image description here

查看更多
登录 后发表回答