Is there a convenient way to calculate percentiles for a sequence or single-dimensional numpy array?
I am looking for something similar to Excel's percentile function.
I looked in NumPy's statistics reference, and couldn't find this. All I could find is the median (50th percentile), but not something more specific.
To calculate the percentile of a series, run:
For example:
check for scipy.stats module:
The definition of percentile I usually see expects as a result the value from the supplied list below which P percent of values are found... which means the result must be from the set, not an interpolation between set elements. To get that, you can use a simpler function.
If you would rather get the value from the supplied list at or below which P percent of values are found, then use this simple modification:
Or with the simplification suggested by @ijustlovemath:
You might be interested in the SciPy Stats package. It has the percentile function you're after and many other statistical goodies.
percentile()
is available innumpy
too.This ticket leads me to believe they won't be integratingpercentile()
into numpy anytime soon.Here's how to do it without numpy, using only python to calculate the percentile.
A convenient way to calculate percentiles for a one-dimensional numpy sequence or matrix is by using numpy.percentile <https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html>. Example:
However, if there is any NaN value in your data, the above function will not be useful. The recommended function to use in that case is the numpy.nanpercentile <https://docs.scipy.org/doc/numpy/reference/generated/numpy.nanpercentile.html> function:
In the two options presented above, you can still choose the interpolation mode. Follow the examples below for easier understanding.
If your input array only consists of integer values, you might be interested in the percentil answer as an integer. If so, choose interpolation mode such as ‘lower’, ‘higher’, or ‘nearest’.