When is it appropriate to use df.value_counts() vs

I've heard in Pandas there's often multiple ways to do the same thing, but I was wondering –

If I'm trying to group data by a value within a specific column and count the number of items with that value, when does it make sense to use df.groupby('colA').count() and when does it make sense to use df['colA'].value_counts() ?

标签： python pandas dataframe pandas-groupby

2条回答

\"骚年 ilove

2楼-- · 2019-01-12 00:51

There is difference value_counts return:

The resulting object will be in descending order so that the first element is the most frequently-occurring element.

but count not, it sort output by index (created by column in groupby('col')).

df.groupby('colA').count()

is for aggregate all columns of df by function count. So it count values excluding NaNs.

So if need count only one column need:

df.groupby('colA')['colA'].count()

Sample:

df = pd.DataFrame({'colB':list('abcdefg'),
                   'colC':[1,3,5,7,np.nan,np.nan,4],
                   'colD':[np.nan,3,6,9,2,4,np.nan],
                   'colA':['c','c','b','a',np.nan,'b','b']})

print (df)
  colA colB  colC  colD
0    c    a   1.0   NaN
1    c    b   3.0   3.0
2    b    c   5.0   6.0
3    a    d   7.0   9.0
4  NaN    e   NaN   2.0
5    b    f   NaN   4.0
6    b    g   4.0   NaN

print (df['colA'].value_counts())
b    3
c    2
a    1
Name: colA, dtype: int64

print (df.groupby('colA').count())
      colB  colC  colD
colA                  
a        1     1     1
b        3     2     2
c        2     2     1

print (df.groupby('colA')['colA'].count())
colA
a    1
b    3
c    2
Name: colA, dtype: int64

0人赞添加讨论(0) 举报

Explosion°爆炸

3楼-- · 2019-01-12 00:57

Groupby and value_counts are totally different functions. You cannot perform value_counts on a dataframe.

Value Counts are limited only for a single column or series and it's sole purpose is to return the series of frequencies of values

Groupby returns a object so one can perform statistical computations over it. So when you do df.groupby(col).count() it will return the number of true values present in columns with respect to the specific columns in groupby.

When should be value_counts used and when should groupby.count be used : Lets take an example

df = pd.DataFrame({'id': [1, 2, 3, 4, 2, 2, 4], 'color': ["r","r","b","b","g","g","r"], 'size': [1,2,1,2,1,3,4]})

Groupby count:

df.groupby('color').count()
       id  size
color          
b       2     2
g       2     2
r       3     3

Groupby count is generally used for getting the valid number of values present in all the columns with reference to or with respect to one or more columns specified. So not a number (nan) will be excluded.

To find the frequency using groupby you need to aggregate against the specified column itself like @jez did. (maybe to avoid this and make developers life easy value_counts is implemented ).

Value Counts:

df['color'].value_counts()

r    3
g    2
b    2
Name: color, dtype: int64

Value count is generally used for finding the frequency of the values present in one particular column.

In conclusion :

.groupby(col).count() should be used when you want to find the frequency of valid values present in columns with respect to specified col.

.value_counts() should be used to find the frequencies of a series.

0人赞添加讨论(0) 举报

When is it appropriate to use df.value_counts() vs

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间