I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For example, 95% VaR is the 5th percentile figure in that time series.
I have my time series in a Pandas dataframe, and am currently using the pd.quantile() function to compute the percentile. My question is, typical market convention for VaR is use an exclusionary percentile (ie: 95% VaR is interpreted as: there is a 95% chance your portfolio will not loose MORE than the computed number) - akin to how MS Excel PERECENTILE.EXC() works. Pandas quantile() works akin to how Excel's PERCENTILE.INC() works - it includes the specified percentile. I have scoured several python math packages as well as this forum for a python solution that uses the same methodology as PERCENTILE.EXC() in Excel with no luck. I was hoping someone here might have a suggestion?
Here is sample code.
import pandas as pd
import numpy as np
test_pd = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
test_np = np.array([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
print 'pandas: ' + str(test_pd.quantile(.05))
print 'numpy: '+ str(np.percentile(test_np,5))
The answer i am looking for is -77.4
Thanks,
Ryan
It won't be as efficient as Pandas' own percentile but it should work:
Note that Excel fails for small percentiles; it is not a bug. It is because ranks that go below the minimum value is not suitable for interpolation. So you might want to check if rank > 0 in the
quantile_exc
function (see the assertion part).EDIT: I just saw your edit. I think you are making a mistake. The value -77.4 is actually the 99.5% percentile of your data. Try
test_pd.quantile(.005)
. I believe that you must have made a mistake in Excel when specifying your percentile.EDIT 2: I just tested it myself in Excel. For the 50-th percentile, I am getting the correct value in both Excel and Numpy/Pandas. For the 5th percentile however, I am getting -72 in Pandas/Numpy, and -74.6 in Excel. But Excel is just wrong here: it is very obvious that -74.6 is the 0.5th percentile, not the 5th...
FINAL EDIT: After some testing, it seems like Excel is behaving erratically around very small values of k with the
PERCENTILE.EXC()
function. Indeed, using the function with any k < 0.05 returns an error, so 0.05 must be a threshold below which the function is not working properly. I do not know why Excel chooses to return the 0.5th percentile when asked to exclude the 5th percentile (the logical behavior would be to return the 4.9th percentile, or the 4.99th...). However, both Numpy, Pandas and Excel return the same values for other values of k. For instance,PERCENTILE.EXC(0.5) = 6
, andtest_pd.quantile(0.5) = 6
as well. I guess the lesson is that we need to be wary of Excel's behavior ;).The way I understand your problem is: you want to know the value that corresponds to the k-th percentile of your data, this k-th percentile excluded. However,
pd.quantile()
returns the value that corresponds to your k-th percentile, this k-th percentile included.I do not think that pd.quantile() returning the k-th percentile included is an issue. Indeed, assuming you want all stocks having a Value at Risk strictly above the 5-th percentile, you would do:
Because you used a "smaller than" ( < ) operator, the values which exactly correspond to your 5-th percentile will be excluded, similar to Excel's PERCENTILE.EXC() function.
Do tell me if this is what you were looking for.
It seems that the implementation of
Percentile
function in Packageorg.apache.commons.math3.stat.descriptive.rank
ties out withPERCENTILE.EXC
function in Excel.Below is the Python implementation with some small tweaks.
Here
row
is apandas.Series