Pandas : Use groupby on each element of list

Maybe I'm missing the obvious.

I have a pandas dataframe that looks like this :

   id        product              categories
    0        Silmarillion         ['Book', 'Fantasy']
    1        Headphones           ['Electronic', 'Material']
    2        Dune                 ['Book', 'Sci-Fi']

I'd like to use the groupby function to count the number of appearances of each element in the categories column, so here the result would be

Book       2
Fantasy    1
Electronic 1
Material   1
Sci-Fi     1

However when I try using a groupby function, pandas counts the occurrences of the entire list instead of separating its elements. I have tried multiple different ways of handling this, using tuples or splits, but this far I've been unsuccessful.

标签： python python-3.x pandas numpy pandas-groupby

3条回答

在下西门庆

2楼-- · 2020-07-13 08:26

You can also call pd.value_counts directly on a list.
You can generate the appropriate list via numpy.concatenate, itertools.chain, or cytoolz.concat

from cytoolz import concat
from itertools import chain

cytoolz.concat

pd.value_counts(list(concat(df.categories.values.tolist())))

itertools.chain

pd.value_counts(list(chain(*df.categories.values.tolist())))

numpy.unique + numpy.concatenate

u, c = np.unique(np.concatenate(df.categories.values), return_counts=True)
pd.Series(c, u)

All yield

Book          2
Electronic    1
Fantasy       1
Material      1
Sci-Fi        1
dtype: int64

time testing

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2020-07-13 08:33

You can normalize the records by stacking them then call value_counts():

pd.DataFrame(df['categories'].tolist()).stack().value_counts()
Out: 
Book          2
Fantasy       1
Material      1
Sci-Fi        1
Electronic    1
dtype: int64

0人赞添加讨论(0) 举报

▲ chillily

4楼-- · 2020-07-13 08:39

try this:

In [58]: df['categories'].apply(pd.Series).stack().value_counts()
Out[58]:
Book          2
Fantasy       1
Electronic    1
Sci-Fi        1
Material      1
dtype: int64

0人赞添加讨论(0) 举报

Pandas : Use groupby on each element of list

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间