count the frequency that a value occurs in a dataf

2019-01-01 00:37发布

I have a dataset

|category|
cat a
cat b
cat a

I'd like to be able to return something like (showing unique values and frequency)

category | freq |
cat a       2
cat b       1

标签: python pandas
14条回答
孤独总比滥情好
2楼-- · 2019-01-01 00:43

If your DataFrame has values with the same type, you can also set return_counts=True in numpy.unique().

index, counts = np.unique(df.values,return_counts=True)

np.bincount() could be faster if your values are integers.

查看更多
冷夜・残月
3楼-- · 2019-01-01 00:44

This should work:

df.groupby('category').size()
查看更多
大哥的爱人
4楼-- · 2019-01-01 00:44

@metatoaster has already pointed this out. Go for Counter. It's blazing fast.

import pandas as pd
from collections import Counter
import timeit
import numpy as np

df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])

Timers

%timeit -n 10000 df['NumA'].value_counts()
# 10000 loops, best of 3: 715 µs per loop

%timeit -n 10000 df['NumA'].value_counts().to_dict()
# 10000 loops, best of 3: 796 µs per loop

%timeit -n 10000 Counter(df['NumA'])
# 10000 loops, best of 3: 74 µs per loop

%timeit -n 10000 df.groupby(['NumA']).count()
# 10000 loops, best of 3: 1.29 ms per loop

Cheers!

查看更多
看淡一切
5楼-- · 2019-01-01 00:47

If you want to apply to all columns you can use:

df.apply(pd.value_counts)

This will apply a column based aggregation function (in this case value_counts) to each of the columns.

查看更多
有味是清欢
6楼-- · 2019-01-01 00:48
n_values = data.income.value_counts()

First unique value count

n_at_most_50k = n_values[0]

Second unique value count

n_greater_50k = n_values[1]

n_values

Output:

<=50K    34014
>50K     11208

Name: income, dtype: int64

Output:

n_greater_50k,n_at_most_50k:-
(11208, 34014)
查看更多
怪性笑人.
7楼-- · 2019-01-01 00:49

Assuming you have a Pandas Dataframe df, try:

df.category.value_counts()

The Pandas Manual provides more information.

查看更多
登录 后发表回答