Hi I'm doing some code for a genomics class and I am having difficulty on a certain part.
I have a set of mutually exclusive events with probabilities
I want to simulate randomly sampling an event n times with the given probability.
input: probabilities = {0.3, 0.2, 0.5} events{e1,e2,e3} n=100
output: there should be ~50 results for e3, ~20 for e2 and ~30 for e1. Note that these are probably not exactly 50, 20, 30 because empirical values are different from theoretical values...
Let's assume that we have three events, each with probabilities .3, .2 and .5, respectively. Then for each sample generated, we generate a number in the range [0,1), let's call this "rand." If "rand" < .3, we generate event 1, if .3 <= "rand" < .5, we generate even 2, otherwise we generate event 3. This can be accomplished using random(), which indeed generates a number in the range [0,1).
Python doesn't have any weighted sampling functionality built in (NumPy/SciPy does), but for a really simple case like this, it's pretty easy:
If you don't have Python 3.2+, you don't have the
accumulate
function; you can fake it with an inefficient one-liner if the list really is this short:… or you can write an explicit loop, or an ugly
reduce
call, or copy the equivalent Python function from the docs.Also, note that
random.uniform(0, totals[-1])
is just a more complicated way of writingrandom.random()
if you can be sure that your numbers add up to 1.0.A quick way to test this:
Those are pretty close to 30%, 20%, and 50% of 100000, respectively.