I've simplified a bug I'm experiencing down to the following lines of code:
int[] vals = new int[8];
for (int i = 0; i < 1500; i++)
vals[new Random(i).nextInt(8)]++;
System.out.println(Arrays.toString(vals));
The output is: [0, 0, 0, 0, 0, 1310, 190, 0]
Is this just an artifact of choosing consecutive numbers to seed Random and then using nextInt with a power of 2? If so, are there other pitfalls like this I should be aware of, and if not, what am I doing wrong? (I'm not looking for a solution to the above problem, just some understanding about what else could go wrong)
Dan, well-written analysis. As the javadoc is pretty explicit about how numbers are calculated, it's not a mystery as to why this happened as much as if there are other anomalies like this to watch out for-- I didn't see any documentation about consecutive seeds, and I'm hoping someone with some experience with java.util.Random can point out other common pitfalls.
As for the code, the need is for several parallel agents to have repeatably random behavior who happen to choose from an enum 8 elements long as their first step. Once I discovered this behavior, the seeds all come from a master Random object created from a known seed. In the former (sequentially-seeded) version of the program, all behavior quickly diverged after that first call to nextInt, so it took quite a while for me to narrow the program's behavior down to the RNG library, and I'd like to avoid that situation in the future.
As much as possible, the seed for an RNG should itself be random. The seeds that you are using are only going to differ in one or two bits.
There's very rarely a good reason to create two separate RNGs in the one program. Your code is not one of those situations where it makes sense.
Just create one RNG and reuse it, then you won't have this problem.
In response to comment from mmyers:
Do you happen to know java.util.Random
well enough to explain why it picks 5
and 6 in this case?
The answer is in the source code for java.util.Random, which is a linear congruential RNG. When you specify a seed in the constructor, it is manipulated as follows.
seed = (seed ^ 0x5DEECE66DL) & mask;
Where the mask simply retains the lower 48 bits and discards the others.
When generating the actual random bits, this seed is manipulated as follows:
randomBits = (seed * 0x5DEECE66DL + 0xBL) & mask;
Now if you consider that the seeds used by Parker were sequential (0 -1499), and they were used once then discarded, the first four seeds generated the following four sets of random bits:
101110110010000010110100011000000000101001110100
101110110001101011010101011100110010010000000111
101110110010110001110010001110011101011101001110
101110110010011010010011010011001111000011100001
Note that the top 10 bits are indentical in each case. This is a problem because he only wants to generate values in the range 0-7 (which only requires a few bits) and the RNG implementation does this by shifting the higher bits to the right and discarding the low bits. It does this because in the general case the high bits are more random than the low bits. In this case they are not because the seed data was poor.
Finally, to see how these bits convert into the decimal values that we get, you need to know that java.util.Random makes a special case when n is a power of 2. It requests 31 random bits (the top 31 bits from the above 48), multiplies that value by n and then shifts it 31 bits to the right.
Multiplying by 8 (the value of n in this example) is the same as shifting left 3 places. So the net effect of this procedure is to shift the 31 bits 28 places to the right. In each of the 4 examples above, this leaves the bit pattern 101 (or 5 in decimal).
If we didn't discard the RNGs after just one value, we would see the sequences diverge. While the four sequences above all start with 5, the second values of each are 6, 0, 2 and 4 respectively. The small differences in the initial seeds start to have an influence.
In response to the updated question: java.util.Random is thread-safe, you can share one instance across multiple threads, so there is still no need to have multiple instances. If you really have to have multiple RNG instances, make sure that they are seeded completely independently of each other, otherwise you can't trust the outputs to be independent.
As to why you get these kind of effects, java.util.Random is not the best RNG. It's simple, pretty fast and, if you don't look too closely, reasonably random. However, if you run some serious tests on its output, you'll see that it's flawed. You can see that visually here.
If you need a more random RNG, you can use java.security.SecureRandom. It's a fair bit slower, but it works properly. One thing that might be a problem for you though is that it is not repeatable. Two SecureRandom instances with the same seed won't give the same output. This is by design.
So what other options are there? This is where I plug my own library. It includes 3 repeatable pseudo-RNGs that are faster than SecureRandom and more random than java.util.Random. I didn't invent them, I just ported them from the original C versions. They are all thread-safe.
I implemented these RNGs because I needed something better for my evolutionary computation code. In line with my original brief answer, this code is multi-threaded but it only uses a single RNG instance, shared between all threads.