Getting both group size and min of column B groupi

2019-07-28 09:33发布

问题:

Given a DataFrame df, I can obtain the size of groups by column A with

df.groupby(['columnA']).size()

and the mimimun value of column B within each group with

df.groupby(['columnA']).agg({'columnB':min})

or (syntactic sugar I guess)

df.groupby(['columnA'])['columnB'].min()

but how can I obtain directly a DataFrame with such 2 columns?

In SQL, if you comfortable with it, this would be as simple as

SELECT count(columnA), min(columnB) FROM table GROUP BY columnA

Thanks in advance for any clue.

回答1:

Placing both of them in agg should work, since agg permits a list of functions.

>>> df
  columnA  columnB  columnC
0    cat1        3      400
1    cat1        2       20
2    cat1        5     3029
3    cat2        1      492
4    cat2        4       30
5    cat3        2      203
6    cat3        6      402
7    cat3        4      391

>>> df.groupby(['columnA']).columnB.agg(['size', 'min'])
         size  min
columnA           
cat1        3    2
cat2        2    1
cat3        3    2

>>> df.groupby(['columnA']).agg(['size', 'min'])
        columnB     columnC     
           size min    size  min
columnA                         
cat1          3   2       3   20
cat2          2   1       2   30
cat3          3   2       3  203

agg also accepts a dict of columns -> functions, and so were you to want to map a function to each column individually you could do so with something (not necessary here) like

df.groupby(['columnA']).agg({'columnA':'size','columnB':'min‌​'})