Sorry about badly phrased title....
I have an object called NGram
class NGram
{
//other properties
double Probability {get; set;} //Value between 1 and 0
}
Now suppose I have a list of these objects such that...
List<NGrams> grams = GetNGrams();
Debug.Assert(grams.Sum(x => x.Probability) == 1);
How can I select a random item from this list while factoring in the probability distribution.
For instance, suppose grams[0].Probability == 0.5
then there should be a 50% chance of selecting grams[0]
I figured I may need something like rand.NextDouble()
but I am at loss.
Here's a more generic way to do it (meaning you don't need to assert that the probabilities add to 1):
static Random rand = new Random();
public NGram GetRandom(IEnumerable<NGram> pool)
{
// get universal probability
double u = pool.Sum (p => p.Probability);
// pick a random number between 0 and u
double r = rand.NextDouble() * u;
double sum = 0;
foreach(NGram n in pool)
{
// loop until the random number is less than our cumulative probability
if(r <= (sum = sum + n.Probability))
{
return n;
}
}
// should never get here
return null;
}
Sort the list, order by Probability, ascending.
Sum the Probability field for all elements in the list. Let's call that sum P.
Get a random number between [0, P], let's call it r
Iterate the list while keeping an accumulate value of the sum of Probability up to the current element you are iterating (pe). End the search when find the first element for which pe >= r
The case of all elements in the array to sum 1 is now just an special case :)
Try this:
List<NGram> grams = new List<NGram>()
{
new NGram() { Probability = 0.5 },
new NGram() { Probability = 0.35 },
new NGram() { Probability = 0.15 }
};
var rnd = new Random();
var result =
grams
.Aggregate(
new { sum = 0.0, target = rnd.NextDouble(), gram = (NGram)null },
(a, g) =>
a.gram == null && a.sum + g.Probability >= a.target
? new { sum = a.sum + g.Probability, a.target, gram = g }
: new { sum = a.sum + g.Probability, a.target, a.gram });
It gives me results like this:
In pseudo code
r = Get a random number between 0 and 1
sum = 0
i = 0
Loop
sum = sum + grams[i].Probability
If sum >= r Then
Exit Loop
End
i = i + 1
End
i is the index of the random item in the list
The idea is to sum up the item's probability until the sum is greater or equal than a random number. Since the probabilities sum up to 1 and the random number is in the range 0 .. 1, you will find an item in any case. Items with a greater probability are more likely to be chosen.
∑P= 0 0.08 0.3 0.43 0.53 0.88 1
+--+--------+----+---+-------------+----+
| | | | | | |
+--+--------+----+---+-------------+----+
i = 0 1 2 3 4 5
You can imagine each item to be of a length equal to its assigned probability. The algorithm is like randomly throwing a dart at a ruler of length one, with all the probabilities stacked up along the ruler. The probability for an item to be hit is proportional to its size (i.e. its assigned probability).