I want to generate a random number with a given probability but I'm not sure how to:
I need a number between 1 and 3
num = ceil(rand*3);
but I need different values to have different probabilities of generating eg.
0.5 chance of 1
0.1 chance of 2
0.4 chance of 3
I'm sure this is straightforward but I can't think of how to do it.
The simple solution is to generate a number with a uniform distribution (using rand
), and manipulate it a bit:
r = rand;
prob = [0.5, 0.1, 0.4];
x = sum(r >= cumsum([0, prob]));
or in a one-liner:
x = sum(rand >= cumsum([0, 0.5, 0.1, 0.4]));
Explanation
Here r
is a uniformly distributed random number between 0 and 1. To generate an integer number between 1 and 3, the trick is to divide the [0, 1] range into 3 segments, where the length of each segment is proportional to its corresponding probability. In your case, you would have:
- Segment [0, 0.5), corresponding to number 1.
- Segment [0.5, 0.6), corresponding to number 2.
- Segment [0.6, 1], corresponding to number 3.
The probability of r
falling within any of the segments is proportional to the probabilities you want for each number. sum(r >= cumsum([0, prob]))
is just a fancy way of mapping an integer number to one of the segments.
Extension
If you're interested in creating a vector/matrix of random numbers, you can use a loop or arrayfun
:
r = rand(3); % # Any size you want
x = arrayfun(@(z)sum(z >= cumsum([0, prob])), r);
Of course, there's also a vectorized solution, I'm just too lazy to write it.
The answers so far are correct, but slow for large inputs: O(m*n) where n is the number of values and m is the number of random samples. Here is a O(m*log(n)) version that takes advantage of monotonicity of the cumsum
result and the binary search used in histc
:
% assume n = numel(prob) is large and sum(prob) == 1
r = rand(m,1);
[~,x] = histc(r,cumsum([0,prob]));
>> c = cumsum([0.5, 0.1, 0.4]);
>> r = rand(1e5, 1);
>> x = arrayfun(@(x) find(x <= c, 1, 'first'), r);
>> h = hist(x, 1:3)
h =
49953 10047 40000
x
distributed as desired.
using randsample
function from Statistics and Machine Learning Toolbox, you can generate random numbers with specified probability mass function (pmf):
pmf = [0.5, 0.1, 0.4];
population = 1:3;
sample_size = 1;
random_number = randsample(population,sample_size,true,pmf);
I think this is the easiest method.
A slightly more general solution would be:
r=rand;
prob=[.5,.1,.4];
prob=cumsum(prob);
value=[1,2,3]; %values corresponding to the probabilities
ind=find(r<=prob,1,'first');
x=value(ind)