I am sorry that I can't reproduce the error with a simpler example, and my code is too complicated to post. If I run the program in IPython shell instead of the regular python, things work out well.
I looked up some previous notes on this problem. They were all caused by using pool to call function defined within a class function. But this is not the case for me.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I would appreciate any help.
UPDATE: The function I pickle is defined at the top level of the module. Though it calls a function that contains a nested function. i.e, f() calls g() calls h() which has a nested function i(), and I am calling pool.apply_async(f). f(), g(), h() are all defined at the top level. I tried simpler example with this pattern and it works though.
As others have said
multiprocessing
can only transfer Python objects to worker processes which can be pickled. If you cannot reorganize your code as described by unutbu, you can usedill
s extended pickling/unpickling capabilities for transferring data (especially code data) as I show below.This solution requires only the installation of
dill
and no other libraries aspathos
:It also works for numpy arrays.
I had the same issue and lots of others with multiprocessing. I decided to put what I've learned into a small open source python script that I called multiprocessing for kids. I think it makes using multiprocessing really easy. You can find it on GitHub:
https://github.com/predictedblog/multiprocessing_for_kids.
I also wrote 2 blog posts with examples on how to use it:
https://predicted.blog/multiprocessing-for-kids/
https://predicted.blog/multiprocessing-for-kids-shared-variables/
You use a function called doMultiprocessingLoop(yourFunction, Iterator) to run yourFunction in multiple processes.
I just want to help people that run into the same issues using multiprocessing over and over as I did. It works for a lot of simple use cases like sharing variables between processes and returning values from them. Even termination of all processes by returning a result is possible. Please read the mentioned blog posts for further details. Reading them will at least give you a better understanding of how multiprocessing works and where the limits are.
Don't hesitate to edit the script if you want to add functionality. Pull requests are also welcome. If you want to make something bigger out of this feel free to contact me and I will give you admin permissions.
This error will also come if you have any inbuilt function inside the model object that was passed to the async job.
So make sure to check the model objects that are passed doesn't have inbuilt functions. (In our case we were using
FieldTracker()
function of django-model-utils inside the model to track a certain field). Here is the link to relevant GitHub issue.I'd use
pathos.multiprocesssing
, instead ofmultiprocessing
.pathos.multiprocessing
is a fork ofmultiprocessing
that usesdill
.dill
can serialize almost anything in python, so you are able to send a lot more around in parallel. Thepathos
fork also has the ability to work directly with multiple argument functions, as you need for class methods.Get
pathos
(and if you like,dill
) here: https://github.com/uqfoundationHere is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.
This piece of code:
yields an error almost identical to the one you posted:
The problem is that the
pool
methods all use aqueue.Queue
to pass tasks to the worker processes. Everything that goes through thequeue.Queue
must be pickable, andfoo.work
is not picklable since it is not defined at the top level of the module.It can be fixed by defining a function at the top level, which calls
foo.work()
:Notice that
foo
is pickable, sinceFoo
is defined at the top level andfoo.__dict__
is picklable.