I'm trying to implement multiprocessing to speed up a replication loop, but cannot get it to work in Python27. This is a very simplified version of my program, based on the docs and other answers here at SO (e.g. Python multiprocessing pool.map for multiple arguments). I realize that there are a number of quesions on multiprocessing, but so far I haven't been able to solve this issue. Hopefully I haven't overlooked anything too trivial.
Code
import itertools
from multiprocessing import Pool
def func(g, h, i):
return g + h + i
def helper(args):
args2 = args[0] + (args[1],)
return func(*args2)
pool = Pool(processes=4)
result = pool.map(helper, itertools.izip(itertools.repeat((2, 3)), range(20)))
print result
This works when using map(...)
, but not when using pool.map(...)
.
Error message:
Process PoolWorker-3:
Traceback (most recent call last):
File "C:\Program_\EPD_python27\lib\multiprocessing\process.py", line 258, in _
bootstrap
self.run()
File "C:\Program_\EPD_python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Program_\EPD_python27\lib\multiprocessing\pool.py", line 85, in worker
task = get()
File "C:\Program_\EPD_python27\lib\multiprocessing\queues.py", line 376, in get
return recv()
AttributeError: 'module' object has no attribute 'helper'
The problem is solved by adding a main()
function as:
import itertools
from multiprocessing import Pool
def func(g, h, i):
return g + h + i
def helper(args):
args2 = args[0] + (args[1],)
return func(*args2)
def main():
pool = Pool(processes=4)
result = pool.map(helper,itertools.izip(itertools.repeat((2, 3)), range(10)))
print result
if __name__ == '__main__':
main()
Based on the answer from @ErikAllik I'm thinking that this might be a Windows-specific problem.
edit: Here is a clear and informative tutorial on multiprocessing in python.
There's a fork of multiprocessing
called pathos (note: use the version on github) that doesn't need starmap
or helpers or all of that other stuff -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos
, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__
block. pathos
is due for a release, after some mild updating -- mostly conversion to python 3.x.
Python 2.7.5 (default, Sep 30 2013, 20:15:49)
[GCC 4.2.1 (Apple Inc. build 5566)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathos.multiprocessing import ProcessingPool
>>> pool = ProcessingPool(nodes=4)
>>>
>>> def func(g,h,i):
... return g+h+i
...
>>> p.map(func, [1,2,3],[4,5,6],[7,8,9])
[12, 15, 18]
>>>
>>> # also can pickle stuff like lambdas
>>> result = pool.map(lambda x: x**2, range(10))
>>> result
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>>
>>> # also does asynchronous map
>>> result = pool.amap(pow, [1,2,3], [4,5,6])
>>> result.get()
[1, 32, 729]
>>>
>>> # or can return a map iterator
>>> result = pool.imap(pow, [1,2,3], [4,5,6])
>>> result
<processing.pool.IMapIterator object at 0x110c2ffd0>
>>> list(result)
[1, 32, 729]
On my OS X, with Python 2.7, your code outputs:
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
I can see your Python paths contain EPD_python27
, so maybe try using a vanila Python distribution, not Enthought Python Distribution.
UPDATE: Please see @fileunderwater's answer for a solution; I've run into this once myself, but had totally forgotten about it :)
Explanation: The problem happens (only on Windows for some reason, but could as well be happening on OS X and Linux) because your module contains top-level code. What multiprocessing
does is that it imports your code in the subprocess and executes it. However, if your module contains top-level code, it will be evaluated/executed immediately as the module gets imported. Wrapping it in main
and only calling main()
conditionally (i.e. with a if __name__ == '__main__'
block), you're preventing this from happening. Also, this is more correct on OS X and Linux, and is generally always preferred over putting code right in the module.