Weighted percentile using numpy-第2页回答

Is there a way to use the numpy.percentile function to compute weighted percentile? Or is anyone aware of an alternative python function to compute weighted percentile?

thanks!

标签： python numpy weighted percentile

9条回答

霸刀☆藐视天下

2楼-- · 2019-01-17 04:57

def weighted_percentile(a, percentile = np.array([75, 25]), weights=None):
    """
    O(nlgn) implementation for weighted_percentile.
    """
    percentile = np.array(percentile)/100.0
    if weights is None:
        weights = np.ones(len(a))
    a_indsort = np.argsort(a)
    a_sort = a[a_indsort]
    weights_sort = weights[a_indsort]
    ecdf = np.cumsum(weights_sort)

    percentile_index_positions = percentile * (weights.sum()-1)+1
    # need the 1 offset at the end due to ecdf not starting at 0
    locations = np.searchsorted(ecdf, percentile_index_positions)

    out_percentiles = np.zeros(len(percentile_index_positions))

    for i, empiricalLocation in enumerate(locations):
        # iterate across the requested percentiles 
        if ecdf[empiricalLocation-1] == np.floor(percentile_index_positions[i]):
            # i.e. is the percentile in between 2 separate values
            uppWeight = percentile_index_positions[i] - ecdf[empiricalLocation-1]
            lowWeight = 1 - uppWeight

            out_percentiles[i] = a_sort[empiricalLocation-1] * lowWeight + \
                                 a_sort[empiricalLocation] * uppWeight
        else:
            # i.e. the percentile is entirely in one bin
            out_percentiles[i] = a_sort[empiricalLocation]

    return out_percentiles

This is my function, it give identical behaviour to

np.percentile(np.repeat(a, weights), percentile)

With less memory overhead. np.percentile is an O(n) implementation so it's potentially faster for small weights. It has all the edge cases sorted out - it's an exact solution. The interpolation answers above assume linear, when it's a step for most of the case, except when the weight is 1.

Say we have data [1,2,3] with weights [3, 11, 7] and I want the 25% percentile. My ecdf is going to be [3, 10, 21] and I'm looking for the 5th value. The interpolation will see [3,1] and [10, 2] as the matches and interpolate giving 1.28 despite being entirely in the 2nd bin with a value of 2.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-01-17 04:57

here my solution:

def my_weighted_perc(data,perc,weights=None):
    if weights==None:
        return nanpercentile(data,perc)
    else:
        d=data[(~np.isnan(data))&(~np.isnan(weights))]
        ix=np.argsort(d)
        d=d[ix]
        wei=weights[ix]
        wei_cum=100.*cumsum(wei*1./sum(wei))
        return interp(perc,wei_cum,d)

it simply calculates the weighted CDF of the data and then it uses to estimate the weighted percentiles.

0人赞添加讨论(0) 举报

放我归山

4楼-- · 2019-01-17 04:59

Completely vectorized numpy solution

Here is the code I'm using. It's not an optimal one (which I'm unable write in numpy), but still much faster and more reliable than accepted solution

def weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False):
    """ Very close to numpy.percentile, but supports weights.
    NOTE: quantiles should be in [0, 1]!
    :param values: numpy.array with data
    :param quantiles: array-like with many quantiles needed
    :param sample_weight: array-like of the same length as `array`
    :param values_sorted: bool, if True, then will avoid sorting of initial array
    :param old_style: if True, will correct output to be consistent with numpy.percentile.
    :return: numpy.array with computed quantiles.
    """
    values = numpy.array(values)
    quantiles = numpy.array(quantiles)
    if sample_weight is None:
        sample_weight = numpy.ones(len(values))
    sample_weight = numpy.array(sample_weight)
    assert numpy.all(quantiles >= 0) and numpy.all(quantiles <= 1), 'quantiles should be in [0, 1]'

    if not values_sorted:
        sorter = numpy.argsort(values)
        values = values[sorter]
        sample_weight = sample_weight[sorter]

    weighted_quantiles = numpy.cumsum(sample_weight) - 0.5 * sample_weight
    if old_style:
        # To be convenient with numpy.percentile
        weighted_quantiles -= weighted_quantiles[0]
        weighted_quantiles /= weighted_quantiles[-1]
    else:
        weighted_quantiles /= numpy.sum(sample_weight)
    return numpy.interp(quantiles, weighted_quantiles, values)

Examples:

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])

array([ 1. , 3.2, 9. ])

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.], sample_weight=[2, 1, 2, 4, 1])

array([ 1. , 3.2, 9. ])

0人赞添加讨论(0) 举报

上一页 1 2

Weighted percentile using numpy

Completely vectorized numpy solution

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间