Weighted random number generation

2019-05-19 00:53发布

问题:

I would like to generate weighted random numbers in an exact manner. I can explain exact with an example: My input array is [1, 2, 3] and their weights are again [1, 2, 3]. In that case I expect to see 1 for 1 times, 2 for 2 times and 3 for 3. Like 3 -> 2 -> 3 -> 1 -> 3 -> 2...

I am implementing random number generation with rand() to get a range between [0, sum_of_weights). sum_of_weights = 1 + 2 + 3 = 6 for the example above. I searched for existing solutions on the Internet, however the result is not what I want. Sometimes I got 2 more than 2 times and no 1 in the sequence. Its still weighted but not exactly give the number of times I waited for.

I am not sure whats wrong with my code below. Should I do something wrong or I try totally different? Thanks for your answers.

int random_t (int items[], int items_weight[], int number_of_items)  
{   
    double random_weight;  
    double sum_of_weight = 0;
    int i;

    /* Calculate the sum of weights */  
    for (i = 0; i < number_of_items; i++) {
        sum_of_weight += items_weight[i];
    }

    /* Choose a random number in the range [0,1) */
    srand(time(NULL));
    double g = rand() / ( (double) RAND_MAX + 1.0 );
    random_weight = g * sum_of_weight;

    /* Find a random number wrt its weight */
    int temp_total = 0;

    for (i = 0; i < number_of_items; i++) 
    {
            temp_total += items_weight[i];

            if (random_weight < temp_total)
            {
                return items[i];
            } 
    }   
        return -1; /* Oops, we could not find a random number */
}

I also tried something different (the code is below). It worked for my case, but integer overflow and extensive use of static variables makes it problematic.

If you enter an input array before give NULL and continue to work with it. A little bit similar to strtok() usage.

int random_w(int *arr, int weights[], int size)
{
    int selected, i;
    int totalWeight;
    double ratio;
    static long int total;
    static long int *eachTotal = NULL;
    static int *local_arr = NULL;
    static double *weight = NULL;

    if (arr != NULL) 
        {
            free(eachTotal);
            free(weight);
            eachTotal = (long int*) calloc(size, sizeof(long));
            weight = (double*) calloc(size, sizeof(double));
            total = 0;
            totalWeight = 0;
            local_arr = arr;

            for (i = 0; i < size; i++) 
            {
                totalWeight += weights[i];
            }

            for (i = 0; i < size; i++)
            {
                weight[i] = (double)weights[i] / totalWeight;
            }
            srand(time(NULL));
        }

    while (1)
    {
        selected = rand() % size;
        ratio = (double)(eachTotal[selected])/(double)(total+1);
        if (ratio < weight[selected])
        {
            total++;
            eachTotal[selected]++;

            return local_arr[selected];
        }
    }
}

回答1:

Is this what you want?

# Weights: one 1, two 2s, three 3s
>>> import random
>>> vals = [1] * 1 + [2] * 2 + [3] * 3
>>> random.shuffle(vals)
>>> vals
[2, 3, 1, 2, 3, 3]

Edit: Whoops, for some reason my mind replaced the C tag with the Python one. Regardless, I think what you want is not "weighted" random number generators, but a shuffle. This ought to help.



回答2:

When you say you didn't get "exactly" the number of values you expected for each weighted value, how many runs are you talking? If you only did six runs of any random process, I wouldn't expect you to be able to definitively say anything was working or not. Your code may work fine. Try running it a million times and check the results then. Or maybe you actually want what Nathon is talking about, a preweighted list of values, which you can then randomly shuffle and still have the exact weights you're looking for.



回答3:

You can sample from a multinomial distribution. Your universe of random samples (or "urn of balls in a bucket") is {1, 2, 3} and the probabilities ("weights") of observing each is, respectively, {1/6, 2/6, 3/6}.

For demonstration purposes, a Perl script can give you a list of observations of labeled balls with these probabilities:

#!/usr/bin/perl

use strict;
use warnings;
use Math::Random qw(random_multinomial);
use Data::Dumper;

my $events = 10;
my @probabilities = qw(0.167 0.333 0.5);
my @observations = random_multinomial($events, @probabilities);

print Dumper \@observations;

For 10 events, a single trial will return something like:

$VAR1 = 1;
$VAR2 = 2;
$VAR3 = 7;

This means you have (from this single trial) one 1-labeled event, two 2-labeled events, and seven 3-labeled events.

If you repeat the trial, you may get a different distribution of 1, 2 and 3-labeled events.

You can trivially build a list from this to the equivalent {1, 2, 2, 3, 3, 3, 3, 3, 3, 3} list.

Just randomly shuffle this second list to get your weighted, observed list of random numbers.



回答4:

If you want to have the sample frequencies be completely deterministic, I think the way to go is generate an array that has the proper number of occurrences for each value, then do a random shuffle (which preserves the frequencies) and take successive elements of the shuffled array as your random sequence.



回答5:

ok, my answer will sound like a hack - but short or writing your own distribution - maybe you can map an uniform distribution and leverage boost (check out http://www.boost.org/doc/libs/1_44_0/doc/html/boost_random/reference.html#boost_random.reference.distributions)

so following your example:

  • 1 -> 1
  • 2,3 ->2
  • 4,5,6 ->3
  • 7,8,9,10 ->4 (etc...)

then generate random number between 1 and 10 and return the mapped element. and then use boost's uniform_int distribution to get a number which you then map.

here is an example of generating the numbers; you would then need to map the results:

#include <iostream>
#include <boost/random.hpp>
#include <time.h>
using namespace std;
using namespace boost;

int main ( )  {

    uniform_int<> distribution(0, 10) ;
    mt19937 engine; 
    engine.seed(time(NULL));   
    variate_generator<mt19937, uniform_int<> > myrandom (engine, distribution);

    cout << myrandom() << endl;

}


标签: c random