How can I filter a Pandas GroupBy object and obtai

2020-08-15 10:21发布

问题:

When performing filter on the result of a Pandas groupby operation, it returns a dataframe. But supposing that I want to perform further group computations, I have to call groupby again, which seems sort of round about. Is there a more idiomatic way of doing this?

EDIT:

To illustrate what I'm talking about:

We shamelessly steal a toy dataframe from the Pandas docs, and group:

>>> dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc')})
>>> grouped = dff.groupby('B')
>>> type(grouped)
<class 'pandas.core.groupby.DataFrameGroupBy'>

This returns a groupby object over which we can iterate, perform group-wise operations, etc. But if we filter:

>>> filtered = grouped.filter(lambda x: len(x) > 2)
>>> type(filtered)
<class 'pandas.core.frame.DataFrame'>

We get back a dataframe. Is there a nice idiomatic way of obtaining the filtered groups back, instead of just the original rows which belonged to the filtered groups?

回答1:

If you want to combine a filter and an aggregate, the best way I can think of would be to combine your filter and aggregate using a ternary if inside apply, returning None for filtered groups, and then dropna to remove these rows from your final result:

grouped.apply(lambda x: x.sum() if len(x) > 2 else None).dropna()

If you're wanting to iterate through the groups, say to join them back together, you could use a generator comprehension

pd.concat(g for i,g in grouped if len(g)>2)

Ultimately I think it would be better if groupby.filter had an option to return a groupby object.



标签: python pandas