I'm trying to use multiprocessing
's Pool.map()
function to divide out work simultaneously. When I use the following code, it works fine:
import multiprocessing
def f(x):
return x*x
def go():
pool = multiprocessing.Pool(processes=4)
print pool.map(f, range(10))
if __name__== '__main__' :
go()
However, when I use it in a more object-oriented approach, it doesn't work. The error message it gives is:
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
__builtin__.instancemethod failed
This occurs when the following is my main program:
import someClass
if __name__== '__main__' :
sc = someClass.someClass()
sc.go()
and the following is my someClass
class:
import multiprocessing
class someClass(object):
def __init__(self):
pass
def f(self, x):
return x*x
def go(self):
pool = multiprocessing.Pool(processes=4)
print pool.map(self.f, range(10))
Anyone know what the problem could be, or an easy way around it?
There's another short-cut you can use, although it can be inefficient depending on what's in your class instances.
As everyone has said the problem is that the
multiprocessing
code has to pickle the things that it sends to the sub-processes it has started, and the pickler doesn't do instance-methods.However, instead of sending the instance-method, you can send the actual class instance, plus the name of the function to call, to an ordinary function that then uses
getattr
to call the instance-method, thus creating the bound method in thePool
subprocess. This is similar to defining a__call__
method except that you can call more than one member function.Stealing @EricH.'s code from his answer and annotating it a bit (I retyped it hence all the name changes and such, for some reason this seemed easier than cut-and-paste :-) ) for illustration of all the magic:
The output shows that, indeed, the constructor is called once (in the original pid) and the destructor is called 9 times (once for each copy made = 2 or 3 times per pool-worker-process as needed, plus once in the original process). This is often OK, as in this case, since the default pickler makes a copy of the entire instance and (semi-) secretly re-populates it—in this case, doing:
—that's why even though the destructor is called eight times in the three worker processes, it counts down from 1 to 0 each time—but of course you can still get into trouble this way. If necessary, you can provide your own
__setstate__
:in this case for instance.
In this simple case, where
someClass.f
is not inheriting any data from the class and not attaching anything to the class, a possible solution would be to separate outf
, so it can be pickled:Update: as of the day of this writing, namedTuples are pickable (starting with python 2.7)
The issue here is the child processes aren't able to import the class of the object -in this case, the class P-, in the case of a multi-model project the Class P should be importable anywhere the child process get used
a quick workaround is to make it importable by affecting it to globals()
You could also define a
__call__()
method inside yoursomeClass()
, which callssomeClass.go()
and then pass an instance ofsomeClass()
to the pool. This object is pickleable and it works fine (for me)...A potentially trivial solution to this is to switch to using
multiprocessing.dummy
. This is a thread based implementation of the multiprocessing interface that doesn't seem to have this problem in Python 2.7. I don't have a lot of experience here, but this quick import change allowed me to call apply_async on a class method.A few good resources on
multiprocessing.dummy
:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy
http://chriskiehl.com/article/parallelism-in-one-line/
All of these solutions are ugly because multiprocessing and pickling is broken and limited unless you jump outside the standard library.
If you use a fork of
multiprocessing
calledpathos.multiprocesssing
, you can directly use classes and class methods in multiprocessing'smap
functions. This is becausedill
is used instead ofpickle
orcPickle
, anddill
can serialize almost anything in python.pathos.multiprocessing
also provides an asynchronous map function… and it canmap
functions with multiple arguments (e.g.map(math.pow, [1,2,3], [4,5,6])
)See: What can multiprocessing and dill do together?
and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
And just to be explicit, you can do exactly want you wanted to do in the first place, and you can do it from the interpreter, if you wanted to.
Get the code here: https://github.com/uqfoundation/pathos