Given a DataFrame df
, I can obtain the size of groups by column A with
df.groupby(['columnA']).size()
and the mimimun value of column B within each group with
df.groupby(['columnA']).agg({'columnB':min})
or (syntactic sugar I guess)
df.groupby(['columnA'])['columnB'].min()
but how can I obtain directly a DataFrame with such 2 columns?
In SQL, if you comfortable with it, this would be as simple as
SELECT count(columnA), min(columnB) FROM table GROUP BY columnA
Thanks in advance for any clue.
Placing both of them in agg
should work, since agg
permits a list of functions.
>>> df
columnA columnB columnC
0 cat1 3 400
1 cat1 2 20
2 cat1 5 3029
3 cat2 1 492
4 cat2 4 30
5 cat3 2 203
6 cat3 6 402
7 cat3 4 391
>>> df.groupby(['columnA']).columnB.agg(['size', 'min'])
size min
columnA
cat1 3 2
cat2 2 1
cat3 3 2
>>> df.groupby(['columnA']).agg(['size', 'min'])
columnB columnC
size min size min
columnA
cat1 3 2 3 20
cat2 2 1 2 30
cat3 3 2 3 203
agg
also accepts a dict of columns -> functions, and so were you to want to map a function to each column individually you could do so with something (not necessary here) like
df.groupby(['columnA']).agg({'columnA':'size','columnB':'min'})