- How to perform aggregation with pandas?
- No DataFrame after aggregation! What happened?
- How to aggregate mainly strings columns (to
list
s,tuple
s,strings with separator
)? - How to aggregate counts?
- How to create new column filled by aggregated values?
I've seen these recurring questions asking about various faces of the pandas aggregate functionality. Most of the information regarding aggregation and its various use cases today is fragmented across dozens of badly worded, unsearchable posts. The aim here is to collate some of the more important points for posterity.
This Q/A is meant to be the next instalment in a series of helpful user-guides:
- How to pivot a dataframe,
- Pandas concat
- How do I operate on a DataFrame with a Series for every column
- Pandas Merging 101
Please note that this post is not meant to be a replacement for the documentation about aggregation and about groupby, so please read that as well!
Question 1
How to perform aggregation with pandas ?
Expanded aggregation documentation.
Aggregating functions are the ones that reduce the dimension of the returned objects. It means output Series/DataFrame have less or same rows like original. Some common aggregating functions are tabulated below:
Aggregation by filtered columns and cython implemented functions:
Aaggregate function is using for all columns without specified in
groupby
function, hereA, B
columns:You can also specified only some columns used for aggregation in list after
groupby
function:Same results by using function
DataFrameGroupBy.agg
:For multiplied functions applied for one column use list of
tuple
s - names of new columns and aggregted functions:If want to pass multiple functions is possible pass
list
oftuple
s:Then get
MultiIndex
in columns:And for converting to columns, flattening
MultiIndex
usemap
withjoin
:Another solution is pass list of aggregate functions, then flatten
MultiIndex
and for another columns names usestr.replace
:If want specified each column with aggregated function separately pass
dictionary
:You can pass custom function too:
Question 2
No DataFrame after aggregation! What happened?
Aggregation by 2 or more columns:
First check
Index
andtype
of pandas object:There are 2 solutions how get
MultiIndex Series
to columns:as_index=False
Series.reset_index
:If group by one column:
... get
Series
withIndex
:And solution is same like in
MultiIndex Series
:Question 3
How to aggregate mainly strings columns (to
list
s,tuple
s,strings with separator
)?Instead of aggregeta function is possible pass
list
,tuple
,set
for converting column:Alternative is use
GroupBy.apply
:For converting to strings with separator use
.join
only if string column:If numeric column use lambda function with
astype
for converting tostring
s:Another solution is converting to strings before
groupby
:For converting all columns pass no list of column(s) after
groupby
. There is no columnD
because automatic exclusion of 'nuisance' columns, it means all numeric columns are excluded.So it's necessary to convert all columns into strings, then get all columns:
Question 4
How to aggregate counts?
Function
GroupBy.size
forsize
of each group:Function
GroupBy.count
exclude missing values:Function should be used fo multiple columns for count non missing values:
Related function
Series.value_counts
return size object containing counts of unique values in descending order so that the first element is the most frequently-occurring element. ExcludesNaN
s values by default.If you want same output like using function
groupby
+size
addSeries.sort_index
:Question 5
How to create new column filled by aggregated values?
Method
GroupBy.transform
returns an object that is indexed the same (same size) as the one being groupedPandas documentation for more information.