In numpy
/ scipy
, is there an efficient way to get frequency counts for unique values in an array?
Something along these lines:
x = array( [1,1,1,2,2,2,5,25,1,1] )
y = freq_count( x )
print y
>> [[1, 5], [2,3], [5,1], [25,1]]
( For you, R users out there, I'm basically looking for the table()
function )
Old question, but I'd like to provide my own solution which turn out to be the fastest, use normal
list
instead ofnp.array
as input (or transfer to list firstly), based on my bench test.Check it out if you encounter it as well.
For example,
100000 loops, best of 3: 2.26 µs per loop
100000 loops, best of 3: 8.8 µs per loop
100000 loops, best of 3: 5.85 µs per loop
While the accepted answer would be slower, and the
scipy.stats.itemfreq
solution is even worse.A more indepth testing did not confirm the formulated expectation.
Ref. comments below on cache and other in-RAM side-effects that influence a small dataset massively repetitive testing results.
Update: The method mentioned in the original answer is deprecated, we should use the new way instead:
Original answer:
you can use scipy.stats.itemfreq
Take a look at
np.bincount
:http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
And then:
or:
or however you want to combine the counts and the unique values.
This is by far the most general and performant solution; surprised it hasn't been posted yet.
Unlike the currently accepted answer, it works on any datatype that is sortable (not just positive ints), and it has optimal performance; the only significant expense is in the sorting done by np.unique.
As of Numpy 1.9, the easiest and fastest method is to simply use
numpy.unique
, which now has areturn_counts
keyword argument:Which gives:
A quick comparison with
scipy.stats.itemfreq
:import pandas as pd
import numpy as np
pd.Series(name_of_array).value_counts()