Design a fast algorithm to repeatedly generate numbers from the discrete distribution: Given an array a[] of non negative real numbers that sum to 1, the goal is to return index i with probability a[i]
I found this question in a an online algorithm book, Introduction to Programming in Java, chapter 4.2: Sorting and Searching (http://introcs.cs.princeton.edu/java/42sort/) .
the hint says:
Form an array s[] of cumulated sums such that s[i] is the sum of the first i elements of a[]. Now, generate a random real number r between 0 and 1, and use binary search to return the index i for which s[i] ≤ s[i+1].
some how I am not able to understand the hint and hence cant find the solution..
Depending on the granularity, you can create an Index with 100, 1000 or 10000 Elements. Assume a distribution (a,b,c,d) with p=(10%, 20%, 30%, 40%), we create a Map:
We can now pick an element of desired probability very fast:
x is now by 40% d, by 10% a and so on. Short setup, fast access.
The numbers don't even need to sum up to 100, but you have to calculate the range once, then:
Here's a Python algorithm which implements the 'Roulette Wheel' Technique. It's tough to explain without a graphic. The article linked to by templatetypedef should do well for that. Also, note that this algorithm doesn't actually require the weights to be normalized (they don't need to sum to 1), but this will work nonetheless.
There are many ways to answer this problem. This article describes numerous approaches, their strengths, their weaknesses, and their runtimes. It concludes with an algorithm that takes O(n) preprocessing time, then generates numbers in time O(1) each.
The particular approach you're looking for is described under "roulette wheel selection."
Hope this helps!
This is a snippet from wakkerbot/megahal. Here the weights are (unsigned) ints, and their sum is in node->childsum. For maximum speed, the children are sorted (more or less) in descending order. (the weights are expected to have a power-law like distribution, with only a few high weights and many smaller ones)