Avoid basic rand() bias in Monte Carlo simulation?

2019-09-06 16:21发布

问题:

I am rewriting a monte carlo simulation in C from Objective C to use in a dll from VBA/Excel. The "engine" in the calculation is the creation of a random number between 0 and 10001 that is compared to a variable in the 5000-7000 neighbourhood. This is used 4-800 times per iteration and I use 100000 iterations. So that is about 50.000.000 generations of random numbers per run.

While in Objective C the tests showed no bias, I have huge problems with the C code. Objective C is a superset of C, so 95% of the code was copy paste and hard to screw up. I have gone through the rest many times all day yesterday and today and I have found no problems.

I am left with the difference between arc4random_uniform() and rand() with the use of srand(), especially because a bias towards the lower numbers of 0 to 10000. The test I have conducted is consistent with such a bias of .5 to 2 % towards numbers below circa 5000. The any other explanation is if my code avoided repeats which I guess it doesn´t do.

the code is really simple ("spiller1evne" and "spiller2evne" being a number between 5500 and 6500):

srand((unsigned)time(NULL));
for (j=0;j<antala;++j){
[..]
        for (i=1;i<450;i++){
            chance = (rand() % 10001);

[..]

             if (grey==1) {


                 if (chance < spiller1evnea) vinder = 1;
                 else vinder = 2;
            }
            else{
                if (chance < spiller2evnea) vinder = 2;
                else vinder = 1;
            }

Now I don´t need true randomness, pseudorandomness is quite fine. I only need it to be approximatly even distributed on a cummulative basis (like it doesn´t matter much if 5555 is twice as likely to come out as 5556. It does matter if 5500-5599 is 5% more likely as 5600-5699 and if there is a clear 0.5-2% bias towards 0-4000 than 6000-9999.

First, does it sound plausible that rand() is my problem and Is there an easy implementation that meets my low needs?

EDIT: if my suspicion is plausible, could I use any on this:

http://www.azillionmonkeys.com/qed/random.html

Would I be able to just copy paste this in as a replacement (I am writing in C and using Visual Studio, really novice)?:

#include <stdlib.h>

#define RS_SCALE (1.0 / (1.0 + RAND_MAX))

double drand (void) {
    double d;
    do {
       d = (((rand () * RS_SCALE) + rand ()) * RS_SCALE + rand ()) * RS_SCALE;
    } while (d >= 1); /* Round off */
    return d;
}

#define irand(x) ((unsigned int) ((x) * drand ()))

EDIT2: Well clearly the above code works without the same bias so I would this be a recommendation for anyone who have the same "middle-of-the-road"-need as I described above. It does come with a penalty as it calls rand() three times. So I am still looking for a faster solution.

回答1:

The rand() function generates an int in the range [0, RAND_MAX]. If you convert this to a different range via the modulus operator (%), as your original code does, then that introduces non-uniformity unless the size of your target range happens to evenly divide RAND_MAX + 1. That sounds like exactly what you see.

You have multiple options, but if you want to stick with something based on rand() then I suggest this variation on your original approach:

/*
 * Returns a pseudo-random int selected from the uniform distribution
 * over the half-open interval [0, limit), provided that limit does not
 * exceed RAND_MAX.
 */
int range_rand(int limit) {
    int rand_bound = (RAND_MAX / limit) * limit;
    int r;
    while ((r = rand()) >= rand_bound) { /* empty */ }
    return r % limit;
}

Although in principle the number of rand() calls each call to that function will generate is unbounded, in practice the average number of calls is only slightly greater than 1 for relatively small limit values, and the average is less than 2 for every limit value. It removes the non-uniformity I described earlier by choosing the initial random number from a subset of [0, RAND_MAX] whose size is evenly divided by the limit.