How to get random numbers with the wrong generator

2019-03-28 12:24发布

问题:

Question: Suppose you have a random number generator randn() that returns a uniformly distributed random number between 0 and n-1. Given any number m, write a random number generator that returns a uniformly distributed random number between 0 and m-1.

My answer:

-(int)randm() {
    int k=1;
    while (k*n < m) {
        ++k;
    }
    int x = 0;
    for (int i=0; i<k; ++i) {
        x += randn();
    }
    if (x < m) {
        return x;
    } else {
        return randm();
    }
}

Is this correct?

回答1:

You're close, but the problem with your answer is that there is more than one way to write a number as a sum of two other numbers.

If m<n, then this works because the numbers 0,1,...,m-1 appear each with equal probability, and the algorithm terminates almost surely.

This answer does not work in general because there is more than one way to write a number as a sum of two other numbers. For instance, there is only one way to get 0 but there are many many ways to get m/2, so the probabilities will not be equal.

Example: n = 2 and m=3

0 = 0+0
1 = 1+0 or 0+1
2 = 1+1

so the probability distribution from your method is

P(0)=1/4
P(1)=1/2
P(2)=1/4

which is not uniform.


To fix this, you can use unique factorization. Write m in base n, keeping track of the largest needed exponent, say e. Then, find the biggest multiple of m that is smaller than n^e, call it k. Finally, generate e numbers with randn(), take them as the base n expansion of some number x, if x < k*m, return x, otherwise try again.

Assuming that m < n^2, then

int randm() {

    // find largest power of n needed to write m in base n
    int e=0;
    while (m > n^e) {
        ++e;
    }

    // find largest multiple of m less than n^e
    int k=1;
    while (k*m < n^2) {
        ++k
    }
    --k; // we went one too far

    while (1) {
        // generate a random number in base n
        int x = 0;
        for (int i=0; i<e; ++i) {
            x = x*n + randn(); 
        }
        // if x isn't too large, return it x modulo m
        if (x < m*k) 
            return (x % m);
    }
}


回答2:

It is not correct.

You are adding uniform random numbers, which does not produce a uniformly random result. Say n=2 and m = 3, then the possible values for x are 0+0, 0+1, 1+0, 1+1. So you're twice as likely to get 1 than you are to get 0 or 2.

What you need to do is write m in base n, and then generate 'digits' of the base-n representation of the random number. When you have the complete number, you have to check if it is less than m. If it is, then you're done. If it is not, then you need to start over.



回答3:

The sum of two uniform random number generators is not uniformly generated. For instance, the sum of two dice is more likely to be 7 than 12, because to get 12 you need to throw two sixes, whereas you can get 7 as 1 + 6 or 6 + 1 or 2 + 5 or 5 + 2 or ...

Assuming that randn() returns an integer between 0 and n - 1, n * randn() + randn() is uniformly distributed between 0 and n * n - 1, so you can increase its range. If randn() returns an integer between 0 and k * m + j - 1, then call it repeatedly until you get a number <= k * m - 1, and then divide the result by k to get a number uniformly distributed between 0 and m -1.



回答4:

Assuming both n and m are positive integers, wouldn't the standard algorithm of scaling work?

return (int)((float)randn() * m / n);