I have a random walk function, that uses numpy.random
to do the random step.
The function walk
, by itself, works just fine. In parallel, it works as expected in most cases, however in conjunction with multiprocessing
, it fails.
Why does multiprocessing
get it wrong?
import numpy as np
def walk(x, n=100, box=.5, delta=.2):
"perform a random walk"
w = np.cumsum(x + np.random.uniform(-delta,delta,n))
w = np.where(abs(w) > box)[0]
return w[0] if len(w) else n
N = 10
# run N trials, all starting from x=0
pwalk = np.vectorize(walk)
print pwalk(np.zeros(N))
# run again, using list comprehension instead of ufunc
print [walk(0) for i in range(N)]
# run again, using multiprocessing's map
import multiprocessing as mp
p = mp.Pool()
print p.map(walk, [0]*N)
The results, are typically something like...
[47 16 72 8 15 4 38 52 12 41]
[7, 45, 25, 13, 16, 19, 12, 30, 23, 4]
[3, 3, 3, 3, 3, 3, 3, 14, 3, 14]
The first two methods obviously show randomness, while the latter doesn't.
What's going on, so that multiprocessing
doesn't get it right?
If you add a sleep
so it's a sleepwalk
and there's significant delay, multiprocessing
still gets it wrong.
However, if you replace the call to np.random.uniform
with a non-array method like [(random.random()-.5) for i in range(n)]
, then it works as expected.
So why doesn't numpy.random
and multiprocessing
play nice?
You need to reseed in each process to make sure the psuedo-random streams are independent of one another.
I use os.urandom to generate the seeds.