I have the following DataFrame:
df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.
I tried:
df['values'].value_counts()
but it gives me
10 6
9 3
23 2
12 1
The desired output is
10:2
23:2
9:3
10:4
12:1
How can I do this?
This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.
itertools.groupby
It's a generator
You can keep track of where the changes in
df['values']
occur:And
groupby
the changes and alsodf['values']
(to keep them as index) computing thesize
of each groupThe function
groupby
initertools
can help you, forstr
:This function also works for
list
:Note
: fordf
, you always use this way like df['values'] to take 'values' column, because DataFrame have a attributevalues
Using
crosstab
Slightly modify the result above
Base on
python
groupby
Use:
Or:
Last for remove first level:
Explanation:
Compare original column by
shift
ed with not equalne
and then addcumsum
for helperSeries
: