Full Question is:
Write a method to randomly generate a set of m integers from an array of size n
. Each element must have equal probability of being chosen`
This question is picked from "Crack the coding interview" and the solution is this:
We can swap the element with an element at the beginning of the array and then “remember” that the array now only includes elements j
and greater. That is, when we pick subset[0]
to be array[k]
, we replace array[k]
with the first element in the array. When we pick subset[1]
, we consider array[0]
to be “dead” and we pick a random element y
between 1 and array size()
. We then set subset[1] equal to array[y]
, and set array[y]
equal to array[1]. Elements 0 and 1 are now “dead”. Subset[2]
is now chosen from array[2]
through array[array size()]
, and so on.
My question is that if we're shrinking the array from which we're picking random numbers then probability of each number being picked 1/remaining_num_elements
. How does it stay equal for all the elements?
Think about it like you're picking m
random numbers from a bag of n
numbers, with the first j
elements representing the numbers in your hand
and the remaining elements representing the numbers still in the bag
. (You iterate j
from 0 to m - 1
to pull out numbers, as your book suggests. j
, then, represents the number of integers that you have already pulled out of the bag.)
If you're picking m
integers from a bag in real life, then every time you pick a new number, you take from the bag only and not from your hand. Thus, the remaining_num_elements
shrinks at each step.
When you think this way, it should be simple to see that this method guarantees that each element has equal probability of being chosen.
if we're shrinking the array from which we're picking random numbers then probability of each number being picked 1/remaining_num_elements.
Yes, you're right but 1/remaining_num_elements
is the probability of an element getting picked up in a particular turn. Here we are interested in probability of an element ultimately getting picked up in m
turns. which remains same for all the n
elements.
Question you need to ask is: Do each of n
elements get a fair and equal chance of getting picked up in m
turns?
Answer is yes because,
P(an element gets finally picked up in set of m
elements) =
P(element gets picked up in 1st turn) +
P(element doesn't get picked up in 1st turn) * P(element gets picked up in 2nd turn) +
P(element doesn't get picked up in 1st 2 turns) * P(element gets picked up in 3rd turn) + ... and so on
which, If you calculate, remains same for all the n
elements initially present.
The difference you are seeing in the probability is due to the fact what it is conditional property ( the fact something was selected already and in the last fetch this element was not selected or fetched already). However, over probability , or equal fairness to select any given ball overall does not change.