Approximating Normal Distribution by adding Random

2019-08-22 03:48发布

问题:

I would like to generate some random numbers which are normally distributed. It’s not mission critical, so a simple algorithm will suffice. I would then like to supply my own mean and standard deviation.

From what I have been able to read, according to the Central Limit Theorem, I should be able to approximate normally distributed random numbers by adding random numbers together.

For example:

rand()+rand()+rand()+rand()+rand()+rand()

where rand() results in an evenly distributed random number from 0 to 1 is a reasonable approximation. (I am aware that technically it’s 0 ≤ rand() < 1).

The expected mean is 6*0.5 so I get to the desired mean with something like this:

(rand()+rand()+rand()+rand()+rand()+rand()-3) + mean

but what would the standard deviation be?

Once I know that, would setting an arbitrary standard deviation simply be a matter of multiplying?

Update

Experimentally, I have found that

(rand()+rand()+rand()+rand()+rand()+rand()-3)*sqrt(2)*sd+mean

gives me a set of data with the desired standard deviation and mean. I have tested this out in a database (PostgreSQL) with a 10 million rows, using the stddev() and avg() aggregate functions, and typical results are close to within 2 decimal places which isn’t too bad.

I have no idea why sqrt(2) is involved …

Solution

OK, thanks to Severin Pappadeux below, I have an answer.

I can generate a reasonable result with:

(rand() + … + rand() - n/2) / sqrt(n/12) * sd + mean

where n is the number of rand() calls I am prepared to make.

回答1:

From what I have been able to read, according to the Central Limit Theorem, I should be able to approximate normally distributed random numbers by adding random numbers together.

That is a correct approach. The only problem is to carefully analyze the tails you're missing.

Let's consider making N(0,1) - gaussian distributed with mean 0 and std.deviation of 1. Then any other gaussian N(\mu, \sigma) is just scale and shift from N(0,1).

So, proposed algorithm for G(0,1) (which is an approximation for N(0,1)) is

G(0,1) = U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1)

where U(0,1) is uniformly distributed random number in the [0,1) range. Lets take a look at the mean.

E(G(0,1)) = 6*E(U(1,0)) = 6*0.5 = 3

which is exactly what you've got. So, to get 0 mean for G(0,1) we have to subtract 3. Lets now check the variance of the G(0,1), we have to make it equal to 1.

V(G(0,1)) = 6*V(U(1,0)) = 6*(1/12) = 1/2

Std.deviation (σ) is square root of variance, so to get it to 1 you have to divide by sqrt(1/2).

So, final expression would be

G(0,1) = (U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) - 3)/sqrt(1/2)

and it is reasonably good approximation of the N(0,1).

I have no idea why sqrt(2) is involved …

Dividing by sqrt(1/2) is the same as multiplying by sqrt(2) - now I hope you know where it came from.

Some simple corollary - for some other n sum of U(0,1) variance multiplier will include term sqrt(n/12).

Another simple corollary - because V(U(0,1)) is equal to 1/12, then summing twelve U(0,1) will not require any multipliers

G(0,1) = Sum_1^12 U(0,1) - 6

is actually often cited in some old sampling recipes books/papers.

You might also want to take a look at related Irwin-Hall distribution and Bates distribution

UPDATE

I've thought about some simplification of the approach. Suppose we want to sum even number of U(0,1), so n=2m. Again, talking about G(0,1) as an approximation for N(0,1)

G(0,1) = (Sum_1^2m U(0,1) - m ) / sqrt(m/6)

Let's rewrite it as

G(0,1) = (Sum_1^m U(0,1) - (m - Sum_1^m U(0,1)))/sqrt(m/6) =
       = (Sum_1^m U(0,1) - Sum_1^m(1 - U(0,1)))/sqrt(m/6)

Due to the fact, that 1 - U(0,1) has the same distribution as U(0,1) we could write G(0,1) in symmetric form

G(0,1) = (Sum_1^m U(0,1) - Sum_1^m U(0,1))/sqrt(m/6) =
       = Sum_1^m (U(0,1) - U(0,1)) / sqrt(m/6)


回答2:

If you use Python, you can use the [numpy][1] library

import numpy
numpy.random.randn()


回答3:

Standard deviation is defined as follows:

where you iterate over N values, which are represented as xi, and use the mean value (xbar). Some JavaScript pseudocode would look like:

var values = [...];
for(var i = 0, var mean; i < values.length; i++) {
   mean += values[i];
}
mean /= values.length;
for(var i = 0, var standardDev; i < values.length) {
   standardDev += Math.pow(values[i] - mean, 2);
}
standardDev = Math.sqrt(standardDev / (values.length - 1));

Theoretically, a good RNG should deviate in a very flat manner, indicating an equal possibility across all values on the RNG's range.