I'm having some problems with simulations using concurrent.futures
and np.random
.
Example:
import numpy as np
from concurrent.futures import ProcessPoolExecutor, as_completed
from time import sleep
def calc_g():
sleep(1)
u = np.random.uniform()
print u
futures = {}
with ProcessPoolExecutor() as executor:
for i in range(0,10):
job = executor.submit(calc_g)
futures[job] = i
for job in as_completed(futures):
job.result()
My results in this simulations are:
python teste.py
0.590820857053
0.590820857053
0.590820857053
0.590820857053
0.890384312465
0.890384312465
0.890384312465
0.890384312465
0.391709923204
0.391709923204
If I remove the sleep
function in the function calc_g()
, results seem to be a little more random:
python teste.py
0.116725033305
0.919465043075
0.116725033305
0.116725033305
0.608303685887
0.59397039096
0.608862016487
0.800008484487
0.689917804793
0.116725033305
I think that it has to do with the generation of seeds that numpy
uses. Python generates forks from the main program and the same seed is copied to child processes. As generation process of random numbers is deterministic after the generation of seeds, values from np.random.uniform()
are the same.
Can someone explain this better, with examples?
How should I use np.random
in parallel tasks to simulate randomness as coin tossing?
For independent streams of PRNGs in multiprocessing, give each process its own
RandomState
. The simplest fix, change this line:To this: