Full Question is:
Write a method to randomly generate a set of m integers from an array of size
n
. Each element must have equal probability of being chosen`
This question is picked from "Crack the coding interview" and the solution is this:
We can swap the element with an element at the beginning of the array and then “remember” that the array now only includes elements
j
and greater. That is, when we picksubset[0]
to bearray[k]
, we replacearray[k]
with the first element in the array. When we picksubset[1]
, we considerarray[0]
to be “dead” and we pick a random elementy
between 1 and arraysize()
. We then set subset[1] equal toarray[y]
, and setarray[y]
equal to array[1]. Elements 0 and 1 are now “dead”.Subset[2]
is now chosen fromarray[2]
througharray[array size()]
, and so on.
My question is that if we're shrinking the array from which we're picking random numbers then probability of each number being picked 1/remaining_num_elements
. How does it stay equal for all the elements?
Yes, you're right but
1/remaining_num_elements
is the probability of an element getting picked up in a particular turn. Here we are interested in probability of an element ultimately getting picked up inm
turns. which remains same for all then
elements.Question you need to ask is: Do each of
n
elements get a fair and equal chance of getting picked up inm
turns? Answer is yes because,P(an element gets finally picked up in set of
m
elements) = P(element gets picked up in 1st turn) +P(element doesn't get picked up in 1st turn) * P(element gets picked up in 2nd turn) +
P(element doesn't get picked up in 1st 2 turns) * P(element gets picked up in 3rd turn) + ... and so on
which, If you calculate, remains same for all the
n
elements initially present.The difference you are seeing in the probability is due to the fact what it is conditional property ( the fact something was selected already and in the last fetch this element was not selected or fetched already). However, over probability , or equal fairness to select any given ball overall does not change.
Think about it like you're picking
m
random numbers from a bag ofn
numbers, with the firstj
elements representingthe numbers in your hand
and the remaining elements representingthe numbers still in the bag
. (You iteratej
from 0 tom - 1
to pull out numbers, as your book suggests.j
, then, represents the number of integers that you have already pulled out of the bag.)If you're picking
m
integers from a bag in real life, then every time you pick a new number, you take from the bag only and not from your hand. Thus, theremaining_num_elements
shrinks at each step.When you think this way, it should be simple to see that this method guarantees that each element has equal probability of being chosen.