Design a fast algorithm to repeatedly generate numbers from the
discrete distribution: Given an array a[] of non negative real numbers
that sum to 1, the goal is to return index i with probability a[i]
I found this question in a an online algorithm book, Introduction to Programming in Java, chapter 4.2: Sorting and Searching (http://introcs.cs.princeton.edu/java/42sort/) .
the hint says:
Form an array s[] of cumulated sums such that s[i] is the sum of the first i elements of a[]. Now, generate a random real number r between 0 and 1, and use binary search to return the index i for which s[i] ≤ s[i+1].
some how I am not able to understand the hint and hence cant find the solution..
There are many ways to answer this problem. This article describes numerous approaches, their strengths, their weaknesses, and their runtimes. It concludes with an algorithm that takes O(n) preprocessing time, then generates numbers in time O(1) each.
The particular approach you're looking for is described under "roulette wheel selection."
Hope this helps!
Here's a Python algorithm which implements the 'Roulette Wheel' Technique. It's tough to explain without a graphic. The article linked to by templatetypedef should do well for that. Also, note that this algorithm doesn't actually require the weights to be normalized (they don't need to sum to 1), but this will work nonetheless.
import random
trials = 50
selected_indices = []
# weights on each index
distrib = [0.1, 0.4, 0.2, 0.3]
index = random.randrange(0, len(distrib) - 1)
max_weight = max(distrib)
B = 0
# generate 'trials' random indices
for i in range (trials):
# increase B by a factor which is
# guaranteed to be much larger than our largest weight
B = B + random.uniform(0, 2 * max_weight)
# continue stepping through wheel until B lands 'within' a weight
while(B > distrib[index]):
B = B - distrib[index]
index = (index + 1) % len(distrib)
selected_indices.append(index)
print("Randomly selected indices from {0} trials".format(trials))
print(selected_indices)
This is a snippet from wakkerbot/megahal. Here the weights are (unsigned) ints, and their sum is in node->childsum. For maximum speed, the children are sorted (more or less) in descending order. (the weights are expected to have a power-law like distribution, with only a few high weights and many smaller ones)
/*
* Choose a symbol at random from this context.
* weighted by ->thevalue
*/
credit = urnd( node->childsum );
for(cidx=0; 1; cidx = (cidx+1) % node->branch) {
symbol = node->children[cidx].ptr->symbol;
if (credit < node->children[cidx].ptr->thevalue) break;
/* 20120203 if (node->children[cidx].ptr->thevalue == 0) credit--; */
credit -= node->children[cidx].ptr->thevalue;
}
done:
// fprintf(stderr, "{+%u}", symbol );
return symbol;
Depending on the granularity, you can create an Index with 100, 1000 or 10000 Elements. Assume a distribution (a,b,c,d) with p=(10%, 20%, 30%, 40%), we create a Map:
val prob = Map ('a' -> 10, 'b' -> 20, 'c' -> 30, 'd' -> 40)
val index = (for (e <- prob;
i <- (1 to e._2)) yield e._1 ).toList
index: List[Char] = List(a, a, a, a, a, a, a, a, a, a,
b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b,
c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c,
d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d, d)
We can now pick an element of desired probability very fast:
val x = index (random.nextInt (100))
x is now by 40% d, by 10% a and so on. Short setup, fast access.
The numbers don't even need to sum up to 100, but you have to calculate the range once, then:
val max = prob.map (_._2).sum
val x = index (random.nextInt (max))