How should I use random.jumpahead in Python

2019-04-12 04:53发布

I have a application that does a certain experiment 1000 times (multi-threaded, so that multiple experiments are done at the same time). Every experiment needs appr. 50.000 random.random() calls.

What is the best approach to get this really random. I could copy a random object to every experiment and do than a jumpahead of 50.000 * expid. The documentation suggests that jumpahead(1) already scrambles the state, but is that really true?

Or is there another way to do this in 'the best way'?

(No, the random numbers are not used for security, but for a metropolis hasting algorithm. The only requirement is that the experiments are independent, not whether the random sequence is somehow predictable or so)

标签: python random
4条回答
再贱就再见
2楼-- · 2019-04-12 05:30

You shouldn't use that function. There is no proof it can work on Mersenne Twister generator. Indeed, it was removed from Python 3 for that reason.

For more information about generation of pseudo-random numbers on parallel environments, see this article from David Hill.

查看更多
神经病院院长
3楼-- · 2019-04-12 05:40

Per the random module docs at python.org:

"You can instantiate your own instances of Random to get generators that don’t share state."

And there's also a relevant-looking note on jumpahead, as you mention. But the guarantees there are kind of vague. If the calls to OS-provided randomness aren't so expensive as to dominate your running time, I'd skip all the subtlety and do something like:

randoms = [random.Random(os.urandom(4)) for _ in range(num_expts)]

If num_expts is ~1000, then you're unlikely to have any collisions in your seed (birthday paradox says you need about 65000 experiments before there's a >50% probability that you have a collision). If this isn't good enough for you or if the number of experiments is more like 100k instead of 1k, then I think it's reasonable to follow this up with

for idx, r in enumerate(randoms):
  r.jumpahead(idx)

Note that I don't think it will work to just make your seed longer (os.urandom(8), for example), since the random docs state that the seed must be hashable, and so on a 32-bit platform you're only going to get at most 32 bits (4 bytes) of useful entropy in your seed.

This question piqued my curiosity, so I went and looked at the code implementing the random module. I am definitely not a PRNG expert, but it does seem like slightly differing values of n in jumpahead(n) will lead to markedly different Random instance states. (Always scary to contradict Alex Martelli, but the code does use the value of n when shuffling the random state).

查看更多
叛逆
4楼-- · 2019-04-12 05:45

I could copy a random object to every experiment and do than a jumpahead of 50.000 * expid.

Approximately correct. Each thread gets their own Random instance.

Seed all of them to the same seed value. Use a constant to test, use /dev/random when you "run for the record".

Edit. Outside Python and in older implementations, use jumpahead( 50000 * expid ) to avoid the situation where two generators wind up with a parallel sequences of values. In any reasonably current (post 2.3) Python, jumpahead is no longer linear and using expid is sufficient to scramble the state.

You can't simply do jumpahead(1) in each thread, since that will assure they are synchronized. Use jumpahead( expid ) to assure each thread is distinctly scrambled.

The documentation suggests that jumpahead(1) already scrambles the state, but is that really true?

Yes, jumpahead does indeed "scramble" the state. Recall that for a given seed you get one -- long -- but fixed sequence of pseudo-random numbers. You're jumping ahead in this sequence. To pass randomness tests, you must get all your values from this one sequence.

Edit. Once upon a time, jumpahead(1) was limited. Now jumpahead(1) really does a larger scrambling. The scrambling, however, is deterministic. You can't simply do jumpahead(1) in each thread.

If you have multiple generators with different seeds, you violate the "one sequence from one seed" assumption and your numbers aren't going to be as random as if you get them from a single sequence.

If you only jumphead 1, you'll may be getting parallel sequences which will might be similar. [This similarity might not be detectable; theoretically, there's a similarity.]

When you jumpahead 50,000, you assure that you follow the 1-sequence-1-seed premise. You also assure that you won't have adjacent sequences of numbers in two experiments.

Finally, you also have repeatability. For a given seed, you get consistent results.

Same jumpahead: Not Good.

>>> y=random.Random( 1 )
>>> z=random.Random( 1 )
>>> y.jumpahead(1)
>>> z.jumpahead(1)
>>> [ y.random() for i in range(5) ]
[0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567]
>>> [ z.random() for i in range(5) ]
[0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567]
查看更多
forever°为你锁心
5楼-- · 2019-04-12 05:48

jumpahead(1) is indeed sufficient (and identical to jumpahead(50000) or any other such call, in the current implementation of random -- I believe that came in at the same time as the Mersenne Twister based implementation). So use whatever argument fits in well with your programs' logic. (Do use a separate random.Random instance per thread for thread-safety purposes of course, as your question already hints).

(random module generated numbers are not meant to be cryptographically strong, so it's a good thing that you're not using for security purposes;-).

查看更多
登录 后发表回答