A weighted version of random.choice

2018-12-31 03:41发布

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

This function seems overly complex to me, and ugly. I'm hoping everyone here can offer some suggestions on improving it or alternate ways of doing this. Efficiency isn't as important to me as code cleanliness and readability.

20条回答
流年柔荑漫光年
2楼-- · 2018-12-31 04:03

I'd require the sum of choices is 1, but this works anyway

def weightedChoice(choices):
    # Safety check, you can remove it
    for c,w in choices:
        assert w >= 0


    tmp = random.uniform(0, sum(c for c,w in choices))
    for choice,weight in choices:
        if tmp < weight:
            return choice
        else:
            tmp -= weight
     raise ValueError('Negative values in input')
查看更多
裙下三千臣
3楼-- · 2018-12-31 04:04

As of Python v3.6, random.choices could be used to return a list of elements of specified size from the given population with optional weights.

random.choices(population, weights=None, *, cum_weights=None, k=1)

  • population : list containing unique observations. (If empty, raises IndexError)

  • weights : More precisely relative weights required to make selections.

  • cum_weights : cumulative weights required to make selections.

  • k : size(len) of the list to be outputted. (Default len()=1)


Few Caveats:

1) It makes use of weighted sampling with replacement so the drawn items would be later replaced. The values in the weights sequence in itself do not matter, but their relative ratio does.

Unlike np.random.choice which can only take on probabilities as weights and also which must ensure summation of individual probabilities upto 1 criteria, there are no such regulations here. As long as they belong to numeric types (int/float/fraction except Decimal type) , these would still perform.

>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2) If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence.

Specifying both weights and cum_weights raises a TypeError.

>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3) cum_weights are typically a result of itertools.accumulate function which are really handy in such situations.

From the documentation linked:

Internally, the relative weights are converted to cumulative weights before making selections, so supplying the cumulative weights saves work.

So, either supplying weights=[12, 12, 4] or cum_weights=[12, 24, 28] for our contrived case produces the same outcome and the latter seems to be more faster / efficient.

查看更多
人气声优
4楼-- · 2018-12-31 04:06

A general solution:

import random
def weighted_choice(choices, weights):
    total = sum(weights)
    treshold = random.uniform(0, total)
    for k, weight in enumerate(weights):
        total -= weight
        if total < treshold:
            return choices[k]
查看更多
时光乱了年华
5楼-- · 2018-12-31 04:06

It depends on how many times you want to sample the distribution.

Suppose you want to sample the distribution K times. Then, the time complexity using np.random.choice() each time is O(K(n + log(n))) when n is the number of items in the distribution.

In my case, I needed to sample the same distribution multiple times of the order of 10^3 where n is of the order of 10^6. I used the below code, which precomputes the cumulative distribution and samples it in O(log(n)). Overall time complexity is O(n+K*log(n)).

import numpy as np

n,k = 10**6,10**3

# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)

cfd = p.cumsum()
for _ in range(k):
    x = np.random.uniform()
    idx = cfd.searchsorted(x, side='right')
    sampled_element = a[idx]
查看更多
妖精总统
6楼-- · 2018-12-31 04:13

I'm probably too late to contribute anything useful, but here's a simple, short, and very efficient snippet:

def choose_index(probabilies):
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]

No need to sort your probabilities or create a vector with your cmf, and it terminates once it finds its choice. Memory: O(1), time: O(N), with average running time ~ N/2.

If you have weights, simply add one line:

def choose_index(weights):
    probabilities = weights / sum(weights)
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]
查看更多
还给你的自由
7楼-- · 2018-12-31 04:14
import numpy as np
w=np.array([ 0.4,  0.8,  1.6,  0.8,  0.4])
np.random.choice(w, p=w/sum(w))
查看更多
登录 后发表回答