可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For example, 95% VaR is the 5th percentile figure in that time series.

I have my time series in a Pandas dataframe, and am currently using the pd.quantile() function to compute the percentile. My question is, typical market convention for VaR is use an exclusionary percentile (ie: 95% VaR is interpreted as: there is a 95% chance your portfolio will not loose MORE than the computed number) - akin to how MS Excel PERECENTILE.EXC() works. Pandas quantile() works akin to how Excel's PERCENTILE.INC() works - it includes the specified percentile. I have scoured several python math packages as well as this forum for a python solution that uses the same methodology as PERCENTILE.EXC() in Excel with no luck. I was hoping someone here might have a suggestion?

Here is sample code.

import pandas as pd
import numpy as np

test_pd = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])
test_np = np.array([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])

print 'pandas: ' + str(test_pd.quantile(.05))
print 'numpy: '+ str(np.percentile(test_np,5))

The answer i am looking for is -77.4

Thanks,

Ryan

回答1:

It won't be as efficient as Pandas' own percentile but it should work:

def quantile_exc(ser, q):
    ser_sorted = ser.sort_values()
    rank = q * (len(ser) + 1) - 1
    assert rank > 0, 'quantile is too small'
    rank_l = int(rank)
    return ser_sorted.iat[rank_l] + (ser_sorted.iat[rank_l + 1] - 
                                     ser_sorted.iat[rank_l]) * (rank - rank_l)

ser = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])

quantile_exc(ser, 0.05)
Out: -77.400000000000006

quantile_exc(ser, 0.1)
Out: -68.399999999999991

quantile_exc(ser, 0.3)
Out: -2.0

Note that Excel fails for small percentiles; it is not a bug. It is because ranks that go below the minimum value is not suitable for interpolation. So you might want to check if rank > 0 in the quantile_exc function (see the assertion part).

回答2:

EDIT: I just saw your edit. I think you are making a mistake. The value -77.4 is actually the 99.5% percentile of your data. Try test_pd.quantile(.005). I believe that you must have made a mistake in Excel when specifying your percentile.

EDIT 2: I just tested it myself in Excel. For the 50-th percentile, I am getting the correct value in both Excel and Numpy/Pandas. For the 5th percentile however, I am getting -72 in Pandas/Numpy, and -74.6 in Excel. But Excel is just wrong here: it is very obvious that -74.6 is the 0.5th percentile, not the 5th...

FINAL EDIT: After some testing, it seems like Excel is behaving erratically around very small values of k with the PERCENTILE.EXC() function. Indeed, using the function with any k < 0.05 returns an error, so 0.05 must be a threshold below which the function is not working properly. I do not know why Excel chooses to return the 0.5th percentile when asked to exclude the 5th percentile (the logical behavior would be to return the 4.9th percentile, or the 4.99th...). However, both Numpy, Pandas and Excel return the same values for other values of k. For instance, PERCENTILE.EXC(0.5) = 6, and test_pd.quantile(0.5) = 6 as well. I guess the lesson is that we need to be wary of Excel's behavior ;).

The way I understand your problem is: you want to know the value that corresponds to the k-th percentile of your data, this k-th percentile excluded. However, pd.quantile() returns the value that corresponds to your k-th percentile, this k-th percentile included.

I do not think that pd.quantile() returning the k-th percentile included is an issue. Indeed, assuming you want all stocks having a Value at Risk strictly above the 5-th percentile, you would do:

mask = data["VaR"] < pd.quantile(data["VaR"], 0.05)
data_filt = data[mask]

Because you used a "smaller than" ( < ) operator, the values which exactly correspond to your 5-th percentile will be excluded, similar to Excel's PERCENTILE.EXC() function.

Do tell me if this is what you were looking for.

回答3:

It seems that the implementation of Percentile function in Package org.apache.commons.math3.stat.descriptive.rank ties out with PERCENTILE.EXC function in Excel.

Below is the Python implementation with some small tweaks.

Here row is a pandas.Series

    row_sorted = row.sort_values()
    n = len(row_sorted)
    # index start from 0 so we need to -1
    pos = quantile * (n + 1) - 1
    # If pos < 0 return the smallest element in the array.
    if pos < 0:
        var = row_sorted.iat[0]
    # Else if pos >= n - 1 return the largest element in the array.
    if pos >= n - 1:
        var = row_sorted.iat[n - 1]
    # floor(pos)
    pos_lower = int(pos)
    # the fractional part of pos
    d = pos - pos_lower
    # the next element index
    pos_upper = pos_lower + 1
    # calculate var
    lower = row_sorted.iat[pos_lower]
    upper = row_sorted.iat[pos_upper]
    var = lower + (upper - lower) * d