Is there a convenient way to calculate percentiles for a sequence or single-dimensional numpy array?
I am looking for something similar to Excel's percentile function.
I looked in NumPy's statistics reference, and couldn't find this. All I could find is the median (50th percentile), but not something more specific.
Starting
Python 3.8
, the standard library comes with thequantiles
function as part of thestatistics
module:quantiles
returns for a given distributiondist
a list ofn - 1
cut points separating then
quantile intervals (division ofdist
inton
continuous intervals with equal probability):where
n
, in our case (percentiles
) is100
.for a series: used describe functions
suppose you have df with following columns sales and id. you want to calculate percentiles for sales then it works like this,
By the way, there is a pure-Python implementation of percentile function, in case one doesn't want to depend on scipy. The function is copied below:
In case you need the answer to be a member of the input numpy array:
Just to add that the percentile function in numpy by default calculates the output as a linear weighted average of the two neighboring entries in the input vector. In some cases people may want the returned percentile to be an actual element of the vector, in this case, from v1.9.0 onwards you can use the "interpolation" option, with either "lower", "higher" or "nearest".
The latter is an actual entry in the vector, while the former is a linear interpolation of two vector entries that border the percentile