Child processes generating same “random” numbers a

2019-06-27 00:25发布

问题:

I'm having some problems with simulations using concurrent.futures and np.random.

Example:

import numpy as np
from concurrent.futures import ProcessPoolExecutor, as_completed
from time import sleep

def calc_g():
    sleep(1)
    u = np.random.uniform()
    print u

futures = {}
with ProcessPoolExecutor() as executor:

    for i in range(0,10):  
        job = executor.submit(calc_g)
        futures[job] = i

    for job in as_completed(futures):
        job.result()

My results in this simulations are:

python teste.py
0.590820857053
0.590820857053
0.590820857053
0.590820857053
0.890384312465
0.890384312465
0.890384312465
0.890384312465
0.391709923204
0.391709923204

If I remove the sleep function in the function calc_g(), results seem to be a little more random:

python teste.py
0.116725033305
0.919465043075
0.116725033305
0.116725033305
0.608303685887
0.59397039096
0.608862016487
0.800008484487
0.689917804793
0.116725033305

I think that it has to do with the generation of seeds that numpy uses. Python generates forks from the main program and the same seed is copied to child processes. As generation process of random numbers is deterministic after the generation of seeds, values from np.random.uniform() are the same.

Can someone explain this better, with examples?

How should I use np.random in parallel tasks to simulate randomness as coin tossing?

回答1:

For independent streams of PRNGs in multiprocessing, give each process its own RandomState. The simplest fix, change this line:

u = np.random.uniform()

To this:

u = np.random.RandomState().uniform()