I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For example, 95% VaR is the 5th percentile figure in that time series.
I have my time series in a Pandas dataframe, and am currently using the pd.quantile() function to compute the percentile. My question is, typical market convention for VaR is use an exclusionary percentile (ie: 95% VaR is interpreted as: there is a 95% chance your portfolio will not loose MORE than the computed number) - akin to how MS Excel PERECENTILE.EXC() works. Pandas quantile() works akin to how Excel's PERCENTILE.INC() works - it includes the specified percentile. I have scoured several python math packages as well as this forum for a python solution that uses the same methodology as PERCENTILE.EXC() in Excel with no luck. I was hoping someone here might have a suggestion?
Here is sample code.
import pandas as pd
import numpy as np
test_pd = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
test_np = np.array([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
print 'pandas: ' + str(test_pd.quantile(.05))
print 'numpy: '+ str(np.percentile(test_np,5))
The answer i am looking for is -77.4
Thanks,
Ryan
It won't be as efficient as Pandas' own percentile but it should work:
def quantile_exc(ser, q):
ser_sorted = ser.sort_values()
rank = q * (len(ser) + 1) - 1
assert rank > 0, 'quantile is too small'
rank_l = int(rank)
return ser_sorted.iat[rank_l] + (ser_sorted.iat[rank_l + 1] -
ser_sorted.iat[rank_l]) * (rank - rank_l)
ser = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
quantile_exc(ser, 0.05)
Out: -77.400000000000006
quantile_exc(ser, 0.1)
Out: -68.399999999999991
quantile_exc(ser, 0.3)
Out: -2.0
Note that Excel fails for small percentiles; it is not a bug. It is because ranks that go below the minimum value is not suitable for interpolation. So you might want to check if rank > 0 in the quantile_exc
function (see the assertion part).
EDIT: I just saw your edit. I think you are making a mistake. The value -77.4 is actually the 99.5% percentile of your data. Try test_pd.quantile(.005)
. I believe that you must have made a mistake in Excel when specifying your percentile.
EDIT 2: I just tested it myself in Excel. For the 50-th percentile, I am getting the correct value in both Excel and Numpy/Pandas. For the 5th percentile however, I am getting -72 in Pandas/Numpy, and -74.6 in Excel. But Excel is just wrong here: it is very obvious that -74.6 is the 0.5th percentile, not the 5th...
FINAL EDIT: After some testing, it seems like Excel is behaving erratically around very small values of k with the PERCENTILE.EXC()
function. Indeed, using the function with any k < 0.05 returns an error, so 0.05 must be a threshold below which the function is not working properly. I do not know why Excel chooses to return the 0.5th percentile when asked to exclude the 5th percentile (the logical behavior would be to return the 4.9th percentile, or the 4.99th...). However, both Numpy, Pandas and Excel return the same values for other values of k. For instance, PERCENTILE.EXC(0.5) = 6
, and test_pd.quantile(0.5) = 6
as well. I guess the lesson is that we need to be wary of Excel's behavior ;).
The way I understand your problem is: you want to know the value that corresponds to the k-th percentile of your data, this k-th percentile excluded. However, pd.quantile()
returns the value that corresponds to your k-th percentile, this k-th percentile included.
I do not think that pd.quantile() returning the k-th percentile included is an issue. Indeed, assuming you want all stocks having a Value at Risk strictly above the 5-th percentile, you would do:
mask = data["VaR"] < pd.quantile(data["VaR"], 0.05)
data_filt = data[mask]
Because you used a "smaller than" ( < ) operator, the values which exactly correspond to your 5-th percentile will be excluded, similar to Excel's PERCENTILE.EXC() function.
Do tell me if this is what you were looking for.
It seems that the implementation of Percentile
function in Package org.apache.commons.math3.stat.descriptive.rank
ties out with PERCENTILE.EXC
function in Excel.
Below is the Python implementation with some small tweaks.
Here row
is a pandas.Series
row_sorted = row.sort_values()
n = len(row_sorted)
# index start from 0 so we need to -1
pos = quantile * (n + 1) - 1
# If pos < 0 return the smallest element in the array.
if pos < 0:
var = row_sorted.iat[0]
# Else if pos >= n - 1 return the largest element in the array.
if pos >= n - 1:
var = row_sorted.iat[n - 1]
# floor(pos)
pos_lower = int(pos)
# the fractional part of pos
d = pos - pos_lower
# the next element index
pos_upper = pos_lower + 1
# calculate var
lower = row_sorted.iat[pos_lower]
upper = row_sorted.iat[pos_upper]
var = lower + (upper - lower) * d