I've heard in Pandas there's often multiple ways to do the same thing, but I was wondering –
If I'm trying to group data by a value within a specific column and count the number of items with that value, when does it make sense to use df.groupby('colA').count()
and when does it make sense to use df['colA'].value_counts()
?
There is difference
value_counts
return:but
count
not, it sort output byindex
(created by column ingroupby('col')
).is for aggregate all columns of
df
by functioncount.
So it count values excludingNaN
s.So if need
count
only one column need:Sample:
Groupby
andvalue_counts
are totally different functions. You cannot perform value_counts on a dataframe.Value Counts
are limited only for a single column or series and it's sole purpose is to return the series of frequencies of valuesGroupby
returns a object so one can perform statistical computations over it. So when you dodf.groupby(col).count()
it will return the number of true values present in columns with respect to thespecific columns
in groupby.When should be
value_counts
used and when shouldgroupby.count
be used : Lets take an exampleGroupby count:
To find the frequency using groupby you need to aggregate against the specified column itself like @jez did. (maybe to avoid this and make developers life easy value_counts is implemented ).
Value Counts:
In conclusion :
.groupby(col).count()
should be used when you want to find the frequency of valid values present in columns with respect to specifiedcol
..value_counts()
should be used to find the frequencies of a series.