I am studing the multiprocessing module of Python. I have two cases:
Ex. 1
def Foo(nbr_iter):
for step in xrange(int(nbr_iter)) :
print random.uniform(0,1)
...
from multiprocessing import Pool
if __name__ == "__main__":
...
pool = Pool(processes=nmr_parallel_block)
pool.map(Foo, nbr_trial_per_process)
Ex 2. (using numpy)
def Foo_np(nbr_iter):
np.random.seed()
print np.random.uniform(0,1,nbr_iter)
In both cases the random number generators are seeded in their forked processes.
Why do I have to do the seeding explicitly in the numpy example, but not in the Python example?
Here is a nice blog post that will explains the way
numpy.random
works.If you use
np.random.rand()
it will takes the seed created when you imported thenp.random
module. So you need to create a new seed at each thread manually (cf examples in the blog post for example).The python random module does not have this issue and automatically generates different seed for each thread.
If no seed is provided explicitly,
numpy.random
will seed itself using an OS-dependent source of randomness. Usually it will use/dev/urandom
on Unix-based systems (or some Windows equivalent), but if this is not available for some reason then it will seed itself from the wall clock. Since self-seeding occurs at the time when a new subprocess forks, it is possible for multiple subprocesses to inherit the same seed if they forked at the same time, leading to identical random variates being produced by different subprocesses.Often this correlates with the number of concurrent threads you are running. For example:
You can see that groups of up to 8 threads simultaneously forked with the same seed, giving me identical random sequences (I've marked the first group with arrows).
Calling
np.random.seed()
within a subprocess forces the thread-local RNG instance to seed itself again from/dev/urandom
or the wall clock, which will (probably) prevent you from seeing identical output from multiple subprocesses. Best practice is to explicitly pass a different seed (ornumpy.random.RandomState
instance) to each subprocess, e.g.:I'm not entirely sure what underlies the differences between
random
andnumpy.random
in this respect (perhaps it has slightly different rules for selecting a source of randomness to self-seed with compared tonumpy.random
?). I would still recommend explicitly passing a seed or arandom.Random
instance to each subprocess to be on the safe side. You could also use the.jumpahead()
method ofrandom.Random
which is designed for shuffling the states ofRandom
instances in multithreaded programs.numpy 1.17 just introduced [quoting] "..three strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed).."
the 1st strategy is using a SeedSequence object. There are many parent / child options there, but for our case, if you want the same generated random numbers, but different at each run:
(python3, printing 3 random numbers from 4 processes)
If you want the same result for reproducing purposes, you can simply reseed numpy with the same seed (17):