Why doesn't numpy.random and multiprocessing p

2020-04-06 01:30发布

问题:

I have a random walk function, that uses numpy.random to do the random step. The function walk, by itself, works just fine. In parallel, it works as expected in most cases, however in conjunction with multiprocessing, it fails. Why does multiprocessing get it wrong?

import numpy as np

def walk(x, n=100, box=.5, delta=.2):
    "perform a random walk"
    w = np.cumsum(x + np.random.uniform(-delta,delta,n))
    w = np.where(abs(w) > box)[0]
    return w[0] if len(w) else n

N = 10

# run N trials, all starting from x=0
pwalk = np.vectorize(walk)
print pwalk(np.zeros(N))

# run again, using list comprehension instead of ufunc
print [walk(0) for i in range(N)]

# run again, using multiprocessing's map
import multiprocessing as mp
p = mp.Pool()
print p.map(walk, [0]*N)

The results, are typically something like...

[47 16 72  8 15  4 38 52 12 41]
[7, 45, 25, 13, 16, 19, 12, 30, 23, 4]
[3, 3, 3, 3, 3, 3, 3, 14, 3, 14]

The first two methods obviously show randomness, while the latter doesn't. What's going on, so that multiprocessing doesn't get it right?

If you add a sleep so it's a sleepwalk and there's significant delay, multiprocessing still gets it wrong.

However, if you replace the call to np.random.uniform with a non-array method like [(random.random()-.5) for i in range(n)], then it works as expected.

So why doesn't numpy.random and multiprocessing play nice?

回答1:

What's going on, so that multiprocessing doesn't get it right?

You need to reseed in each process to make sure the psuedo-random streams are independent of one another.

I use os.urandom to generate the seeds.