In the example Josh gives of the flawed random method that generates a positive random number with a given upper bound n
, I don't understand the two of the flaws he states.
The method from the book is:
private static final Random rnd = new Random();
//Common but deeply flawed
static int random(int n) {
return Math.abs(rnd.nextInt()) % n;
}
- He says that if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time. Why is this the case? The documentation for
Random.nextInt()
saysReturns the next pseudorandom, uniformly distributed int value from this random number generator's sequence.
So shouldn't it be that if n is a small integer then the sequence will repeat itself, why does this only apply to powers of 2? - Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others. Why does this occur, if
Random.nextInt()
generates random integers that are uniformly distributed? (He provides a code snippet which clearly demonstrates this but I don't understand why this is the case, and how this is related to n being a power of 2).
1) When
n
is a power of 2,rnd % n
is equivalent to selecting a few lower bits of the original. Lower bits of numbers generated by the type of generators used by java are known to be "less random" than the higher bits. It's just the property of the formula used for generating the numbers.2) Imagine, that the largest possible value, returned by
random()
is 10, andn = 7
. Now doingn % 7
maps numbers 7, 8, 9 and 10 into 0, 1, 2, 3 respectively. Therefore, if the original number is uniformly distributed, the result will be heavily biased towards the lower numbers, because they will appear twice as often as 4, 5 and 6. In this case, this does happen regardless of whethern
is a power of two or not, but, if instead of 10 we chose, say, 15 (which is 2^4-1), then anyn
, that is a power of two would result in a uniform distribution, because there would be no "excess" numbers left at the end of the range to cause bias, because the total number of possible values would be exactly divisible by the number of possible remainders.This is not a corollary of anything Josh is saying; rather, it is simply a known property of linear congruential generators. Wikipedia has the following to say:
This is also noted in the Javadoc:
The other version of the function,
Random.nextInt(int)
, works around this by using different bits in this case (emphasis mine):This is a good reason to prefer
Random.nextInt(int)
over usingRandom.nextInt()
and doing your own range transformation.There are 232 distinct numbers that can be returned by
nextInt()
. If you try to put them into n buckets by using% n
, and n isn't a power of 2, some buckets will have more numbers than others. This means that some outcomes will occur more frequently than others even though the original distribution was uniform.Let's look at this using small numbers. Let's say
nextInt()
returned four equiprobable outcomes, 0, 1, 2 and 3. Let's see what happens if we applied% 3
to them:As you can see, the algorithm would return 0 twice as frequently as it would return each of 1 and 2.
This does not happen when n is a power of two, since one power of two is divisible by the other. Consider
n=2
:Here, 0 and 1 occur with the same frequency.
Additional resources
Here are some additional -- if only tangentially relevant -- resources related to LCGs: