I have a file with some probabilities for different values e.g.:
1 0.1
2 0.05
3 0.05
4 0.2
5 0.4
6 0.2
I would like to generate random numbers using this distribution. Does an existing module that handles this exist? It's fairly simple to code on your own (build the cumulative density function, generate a random value [0,1] and pick the corresponding value) but it seems like this should be a common problem and probably someone has created a function/module for it.
I need this because I want to generate a list of birthdays (which do not follow any distribution in the standard random
module).
Maybe it is kind of late. But you can use
numpy.random.choice()
, passing thep
parameter:Verification:
(OK, I know you are asking for shrink-wrap, but maybe those home-grown solutions just weren't succinct enough for your liking. :-)
I pseudo-confirmed that this works by eyeballing the output of this expression:
Another answer, probably faster :)
Here is a more effective way of doing this:
Just call the following function with your 'weights' array (assuming the indices as the corresponding items) and the no. of samples needed. This function can be easily modified to handle ordered pair.
Returns indexes (or items) sampled/picked (with replacement) using their respective probabilities:
A short note on the concept used in the while loop. We reduce the current item's weight from cumulative beta, which is a cumulative value constructed uniformly at random, and increment current index in order to find the item, the weight of which matches the value of beta.
scipy.stats.rv_discrete
might be what you want. You can supply your probabilities via thevalues
parameter. You can then use thervs()
method of the distribution object to generate random numbers.As pointed out by Eugene Pakhomov in the comments, you can also pass a
p
keyword parameter tonumpy.random.choice()
, e.g.If you are using Python 3.6 or above, you can use
random.choices()
from the standard library – see the answer by Mark Dickinson.