I would like to generate weighted random numbers in an exact manner. I can explain exact with an example: My input array is [1, 2, 3] and their weights are again [1, 2, 3]. In that case I expect to see 1 for 1 times, 2 for 2 times and 3 for 3. Like 3 -> 2 -> 3 -> 1 -> 3 -> 2...
I am implementing random number generation with rand() to get a range between [0, sum_of_weights). sum_of_weights = 1 + 2 + 3 = 6 for the example above. I searched for existing solutions on the Internet, however the result is not what I want. Sometimes I got 2 more than 2 times and no 1 in the sequence. Its still weighted but not exactly give the number of times I waited for.
I am not sure whats wrong with my code below. Should I do something wrong or I try totally different? Thanks for your answers.
int random_t (int items[], int items_weight[], int number_of_items)
{
double random_weight;
double sum_of_weight = 0;
int i;
/* Calculate the sum of weights */
for (i = 0; i < number_of_items; i++) {
sum_of_weight += items_weight[i];
}
/* Choose a random number in the range [0,1) */
srand(time(NULL));
double g = rand() / ( (double) RAND_MAX + 1.0 );
random_weight = g * sum_of_weight;
/* Find a random number wrt its weight */
int temp_total = 0;
for (i = 0; i < number_of_items; i++)
{
temp_total += items_weight[i];
if (random_weight < temp_total)
{
return items[i];
}
}
return -1; /* Oops, we could not find a random number */
}
I also tried something different (the code is below). It worked for my case, but integer overflow and extensive use of static variables makes it problematic.
If you enter an input array before give NULL and continue to work with it. A little bit similar to strtok() usage.
int random_w(int *arr, int weights[], int size)
{
int selected, i;
int totalWeight;
double ratio;
static long int total;
static long int *eachTotal = NULL;
static int *local_arr = NULL;
static double *weight = NULL;
if (arr != NULL)
{
free(eachTotal);
free(weight);
eachTotal = (long int*) calloc(size, sizeof(long));
weight = (double*) calloc(size, sizeof(double));
total = 0;
totalWeight = 0;
local_arr = arr;
for (i = 0; i < size; i++)
{
totalWeight += weights[i];
}
for (i = 0; i < size; i++)
{
weight[i] = (double)weights[i] / totalWeight;
}
srand(time(NULL));
}
while (1)
{
selected = rand() % size;
ratio = (double)(eachTotal[selected])/(double)(total+1);
if (ratio < weight[selected])
{
total++;
eachTotal[selected]++;
return local_arr[selected];
}
}
}
If you want to have the sample frequencies be completely deterministic, I think the way to go is generate an array that has the proper number of occurrences for each value, then do a random shuffle (which preserves the frequencies) and take successive elements of the shuffled array as your random sequence.
You can sample from a multinomial distribution. Your universe of random samples (or "urn of balls in a bucket") is
{1, 2, 3}
and the probabilities ("weights") of observing each is, respectively,{1/6, 2/6, 3/6}
.For demonstration purposes, a Perl script can give you a list of observations of labeled balls with these probabilities:
For 10 events, a single trial will return something like:
This means you have (from this single trial) one
1
-labeled event, two2
-labeled events, and seven3
-labeled events.If you repeat the trial, you may get a different distribution of
1
,2
and3
-labeled events.You can trivially build a list from this to the equivalent
{1, 2, 2, 3, 3, 3, 3, 3, 3, 3}
list.Just randomly shuffle this second list to get your weighted, observed list of random numbers.
Is this what you want?
Edit: Whoops, for some reason my mind replaced the C tag with the Python one. Regardless, I think what you want is not "weighted" random number generators, but a shuffle. This ought to help.
When you say you didn't get "exactly" the number of values you expected for each weighted value, how many runs are you talking? If you only did six runs of any random process, I wouldn't expect you to be able to definitively say anything was working or not. Your code may work fine. Try running it a million times and check the results then. Or maybe you actually want what Nathon is talking about, a preweighted list of values, which you can then randomly shuffle and still have the exact weights you're looking for.
ok, my answer will sound like a hack - but short or writing your own distribution - maybe you can map an uniform distribution and leverage boost (check out http://www.boost.org/doc/libs/1_44_0/doc/html/boost_random/reference.html#boost_random.reference.distributions)
so following your example:
then generate random number between 1 and 10 and return the mapped element. and then use boost's uniform_int distribution to get a number which you then map.
here is an example of generating the numbers; you would then need to map the results: