I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2]
. The array sums up to 1.0.
Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn.
I came up with this:
def draw(probs):
cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case
return len(numpy.where(numpy.random.rand() >= cumsum)[0])
It works, but it's too convoluted, there must be a better way. Thanks.
You want to sample from the categorical distribution, which is not implemented in numpy. However, the multinomial distribution is a generalization of the categorical distribution and can be used for that purpose.
use bisect
should do the trick.
I've never used numpy, but I assume my code below (python only) does the same thing as what you accomplished in one line. I'm putting it here just in case you want it.
Looks very c-ish so apologies for not being very pythonic.
weight_total would be 1 for you.
How it works:
Compute the cumulative sum:
Compute a uniformly distributed random number in the half-open interval
[0, cutoffs[-1])
:Use searchsorted to find the index where the random number would be inserted into
cutoffs
:Return
choices[idx]
, whereidx
is that index.use
numpy.random.multinomial
- most efficient