What's with 181783497276652981 and 86825228071

2019-03-07 16:35发布

站内文章 / Java

38 0

混吃等死

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Why were 181783497276652981 and 8682522807148012 chosen in Random.java?

Here's the relevant source code from Java SE JDK 1.7:

/**
 * Creates a new random number generator. This constructor sets
 * the seed of the random number generator to a value very likely
 * to be distinct from any other invocation of this constructor.
 */
public Random() {
    this(seedUniquifier() ^ System.nanoTime());
}

private static long seedUniquifier() {
    // L'Ecuyer, "Tables of Linear Congruential Generators of
    // Different Sizes and Good Lattice Structure", 1999
    for (;;) {
        long current = seedUniquifier.get();
        long next = current * 181783497276652981L;
        if (seedUniquifier.compareAndSet(current, next))
            return next;
    }
}

private static final AtomicLong seedUniquifier
    = new AtomicLong(8682522807148012L);

So, invoking new Random() without any seed parameter takes the current "seed uniquifier" and XORs it with System.nanoTime(). Then it uses 181783497276652981 to create another seed uniquifier to be stored for the next time new Random() is called.

The literals 181783497276652981L and 8682522807148012L are not placed in constants, but they don't appear anywhere else.

At first the comment gives me an easy lead. Searching online for that article yields the actual article. 8682522807148012 doesn't appear in the paper, but 181783497276652981 does appear -- as a substring of another number, 1181783497276652981, which is 181783497276652981 with a 1 prepended.

The paper claims that 1181783497276652981 is a number that yields good "merit" for a linear congruential generator. Was this number simply mis-copied into Java? Does 181783497276652981 have an acceptable merit?

And why was 8682522807148012 chosen?

Searching online for either number yields no explanation, only this page that also notices the dropped 1 in front of 181783497276652981.

Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?

回答1:

Was this number simply mis-copied into Java?

Yes, seems to be a typo.
Does 181783497276652981 have an acceptable merit?

This could be determined using the evaluation algorithm presented in the paper. But the merit of the "original" number is probably higher.
And why was 8682522807148012 chosen?

Seems to be random. It could be the result of System.nanoTime() when the code was written.
Could other numbers have been chosen that would have worked as well as these two numbers?

Not every number would be equally "good". So, no.

Seeding Strategies

There are differences in the default-seeding schema between different versions and implementation of the JRE.

public Random() { this(System.currentTimeMillis()); }

public Random() { this(++seedUniquifier + System.nanoTime()); }

public Random() { this(seedUniquifier() ^ System.nanoTime()); }

The first one is not acceptable if you create multiple RNGs in a row. If their creation times fall in the same millisecond range, they will give completely identical sequences. (same seed => same sequence)

The second one is not thread safe. Multiple threads can get identical RNGs when initializing at the same time. Additionally, seeds of subsequent initializations tend to be correlated. Depending on the actual timer resolution of the system, the seed sequence could be linearly increasing (n, n+1, n+2, ...). As stated in How different do random seeds need to be? and the referenced paper Common defects in initialization of pseudorandom number generators, correlated seeds can generate correlation among the actual sequences of multiple RNGs.

The third approach creates randomly distributed and thus uncorrelated seeds, even across threads and subsequent initializations. So the current java docs:

This constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor.

could be extended by "across threads" and "uncorrelated"

Seed Sequence Quality

But the randomness of the seeding sequence is only as good as the underlying RNG. The RNG used for the seed sequence in this java implementation uses a multiplicative linear congruential generator (MLCG) with c=0 and m=2^64. (The modulus 2^64 is implicitly given by the overflow of 64bit long integers) Because of the zero c and the power-of-2-modulus, the "quality" (cycle length, bit-correlation, ...) is limited. As the paper says, besides the overall cycle length, every single bit has an own cycle length, which decreases exponentially for less significant bits. Thus, lower bits have a smaller repetition pattern. (The result of seedUniquifier() should be bit-reversed, before it is truncated to 48-bits in the actual RNG)

But it is fast! And to avoid unnecessary compare-and-set-loops, the loop body should be fast. This probably explains the usage of this specific MLCG, without addition, without xoring, just one multiplication.

And the mentioned paper presents a list of good "multipliers" for c=0 and m=2^64, as 1181783497276652981.

All in all: A for effort @ JRE-developers ;) But there is a typo. (But who knows, unless someone evaluates it, there is the possibility that the missing leading 1 actually improves the seeding RNG.)

But some multipliers are definitely worse: "1" leads to a constant sequence. "2" leads to a single-bit-moving sequence (somehow correlated) ...

The inter-sequence-correlation for RNGs is actually relevant for (Monte Carlo) Simulations, where multiple random sequences are instantiated and even parallelized. Thus a good seeding strategy is necessary to get "independent" simulation runs. Therefore the C++11 standard introduces the concept of a Seed Sequence for generating uncorrelated seeds.

回答2:

If you consider that the equation used for the random number generator is:

Where X(n+1) is the next number, a is the multipler, X(n) is the current number, c is the increment and m is the modulus.

If you look further into Random, a, c and m are defined in the header of the class

private static final long multiplier = 0x5DEECE66DL;   //= 25214903917 -- 'a'
private static final long addend = 0xBL;               //= 11          -- 'c'
private static final long mask = (1L << 48) - 1;       //= 2 ^ 48 - 1  -- 'm'

and looking at the method protected int next(int bits) this is were the equation is implemented

nextseed = (oldseed * multiplier + addend) & mask;
//X(n+1) =  (X(n)   *      a     +    c  ) mod m

This implies that the method seedUniquifier() is actually getting X(n) or in the first case at initialisation X(0) which is actually 8682522807148012 * 181783497276652981, this value is then modified further by the value of System.nanoTime(). This algorithm is consistent with the equation above but with the following X(0) = 8682522807148012, a = 181783497276652981, m = 2 ^ 64 and c = 0. But as the mod m of is preformed by the long overflow the above equation just becomes

Looking at the paper, the value of a = 1181783497276652981 is for m = 2 ^ 64, c = 0. So it appears to just be a typo and the value 8682522807148012 for X(0) which appears to be a seeming randomly chosen number from legacy code for Random. As seen here. But the merit of these chosen numbers could still be valid but as mentioned by Thomas B. probably not as "good" as the one in the paper.

EDIT - Below original thoughts have since been clarified so can be disregarded but leaving it for reference

This leads me the conclusions:

The reference to the paper is not for the value itself but for the methods used to obtain the values due to the different values of a, c and m
It is mere coincidence that the value is otherwise the same other than the leading 1 and the comment is misplaced (still struggling to believe this though)

There has been a serious misunderstanding of the tables in the paper and the developers have just chosen a value at random as by the time it is multiplied out what was the point in using the table value in the first place especially as you can just provide your own seed value any way in which case these values are not even taken into account

So to answer your question

Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?

Yes, any number could have been used, in fact if you specify a seed value when you Instantiate Random you are using any other value. This value does not have any effect on the performance of the generator, this is determined by the values of a,c and m which are hard coded within the class.

回答3:

As per the link you provided, they have chosen (after adding the missing 1 :) ) the best yield from 2^64 because long can't have have a number from 2^128