One last newbie pandas question for the day: How do I generate a table for a single Series?
For example:
my_series = pandas.Series([1,2,2,3,3,3])
pandas.magical_frequency_function( my_series )
>> {
1 : 1,
2 : 2,
3 : 3
}
Lots of googling has led me to Series.describe() and pandas.crosstabs, but neither of these does quite what I need: one variable, counts by categories. Oh, and it'd be nice if it worked for different data types: strings, ints, etc.
The answer provided by @DSM is simple and straightforward, but I thought I'd add my own input to this question. If you look at the code for pandas.value_counts, you'll see that there is a lot going on.
If you need to calculate the frequency of many series, this could take a while. A faster implementation would be to use numpy.unique with
return_counts = True
Here is an example:
Notice here that the item returned is a pandas.Series
In comparison,
numpy.unique
returns a tuple with two items, the unique values and the counts.You can then combine these into a dictionary:
And then into a
pandas.Series
This will generate output as in below:
Maybe
.value_counts()
?You can use list comprehension on a dataframe to count frequencies of the columns as such
Breakdown: