I'm trying to implement multiprocessing to speed up a replication loop, but cannot get it to work in Python27. This is a very simplified version of my program, based on the docs and other answers here at SO (e.g. Python multiprocessing pool.map for multiple arguments). I realize that there are a number of quesions on multiprocessing, but so far I haven't been able to solve this issue. Hopefully I haven't overlooked anything too trivial.
Code
import itertools
from multiprocessing import Pool
def func(g, h, i):
return g + h + i
def helper(args):
args2 = args[0] + (args[1],)
return func(*args2)
pool = Pool(processes=4)
result = pool.map(helper, itertools.izip(itertools.repeat((2, 3)), range(20)))
print result
This works when using map(...)
, but not when using pool.map(...)
.
Error message:
Process PoolWorker-3:
Traceback (most recent call last):
File "C:\Program_\EPD_python27\lib\multiprocessing\process.py", line 258, in _
bootstrap
self.run()
File "C:\Program_\EPD_python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Program_\EPD_python27\lib\multiprocessing\pool.py", line 85, in worker
task = get()
File "C:\Program_\EPD_python27\lib\multiprocessing\queues.py", line 376, in get
return recv()
AttributeError: 'module' object has no attribute 'helper'
On my OS X, with Python 2.7, your code outputs:
I can see your Python paths contain
EPD_python27
, so maybe try using a vanila Python distribution, not Enthought Python Distribution.UPDATE: Please see @fileunderwater's answer for a solution; I've run into this once myself, but had totally forgotten about it :)
Explanation: The problem happens (only on Windows for some reason, but could as well be happening on OS X and Linux) because your module contains top-level code. What
multiprocessing
does is that it imports your code in the subprocess and executes it. However, if your module contains top-level code, it will be evaluated/executed immediately as the module gets imported. Wrapping it inmain
and only callingmain()
conditionally (i.e. with aif __name__ == '__main__'
block), you're preventing this from happening. Also, this is more correct on OS X and Linux, and is generally always preferred over putting code right in the module.The problem is solved by adding a
main()
function as:Based on the answer from @ErikAllik I'm thinking that this might be a Windows-specific problem.
edit: Here is a clear and informative tutorial on multiprocessing in python.
There's a fork of
multiprocessing
called pathos (note: use the version on github) that doesn't needstarmap
or helpers or all of that other stuff -- the map functions mirror the API for python's map, thus map can take multiple arguments. Withpathos
, you can also generally do multiprocessing in the interpreter, instead of being stuck in the__main__
block.pathos
is due for a release, after some mild updating -- mostly conversion to python 3.x.