When grouping a Pandas DataFrame, when should I use transform
and when should I use aggregate
? How do
they differ with respect to their application in practice and which one do you
consider more important?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
consider the dataframe df
df = pd.DataFrame(dict(A=list('aabb'), B=[1, 2, 3, 4], C=[0, 9, 0, 9]))
groupby
is the standard use aggregater
df.groupby('A').mean()
maybe you want these values broadcast across the whole group and return something with the same index as what you started with.
use transform
df.groupby('A').transform('mean')
df.set_index('A').groupby(level='A').transform('mean')
agg
is used when you have specific things you want to run for different columns or more than one thing run on the same column.
df.groupby('A').agg(['mean', 'std'])
df.groupby('A').agg(dict(B='sum', C=['mean', 'prod']))