I'm aware of various discussions of limitations of the multiprocessing module when dealing with functions that are data members of a class (due to Pickling problems).
But is there another module, or any sort of work-around in multiprocessing, that allows something specifically like the following (specifically without forcing the definition of the function to be applied in parallel to exist outside of the class)?
class MyClass():
def __init__(self):
self.my_args = [1,2,3,4]
self.output = {}
def my_single_function(self, arg):
return arg**2
def my_parallelized_function(self):
# Use map or map_async to map my_single_function onto the
# list of self.my_args, and append the return values into
# self.output, using each arg in my_args as the key.
# The result should make self.output become
# {1:1, 2:4, 3:9, 4:16}
foo = MyClass()
foo.my_parallelized_function()
print foo.output
Note: I can easily do this by moving my_single_function
outside of the class, and passing something like foo.my_args
to the map
or map_async
commands. But this pushes the parallelized execution of the function outside of instances of MyClass
.
For my application (parallelizing a large data query that retrieves, joins, and cleans monthly cross-sections of data, and then appends them into a long time-series of such cross-sections), it is very important to have this functionality inside the class since different users of my program will instantiate different instances of the class with different time intervals, different time increments, different sub-sets of data to gather, and so on, that should all be associated with that instance.
Thus, I want the work of parallelizing to also be done by the instance, since it owns all the data relevant to the parallelized query, and it would just be silly to try write some hacky wrapper function that binds to some arguments and lives outside of the class (Especially since such a function would be non-general. It would need all kinds of specifics from inside the class.)
If you use a fork of
multiprocessing
calledpathos.multiprocesssing
, you can directly use classes and class methods in multiprocessing'smap
functions. This is becausedill
is used instead ofpickle
orcPickle
, anddill
can serialize almost anything in python.pathos.multiprocessing
also provides an asynchronous map function… and it canmap
functions with multiple arguments (e.g.map(math.pow, [1,2,3], [4,5,6])
)See: What can multiprocessing and dill do together?
and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
So you can do exactly what you wanted to do, I believe.
Get the code here: https://github.com/uqfoundation/pathos
There is a better elegant solution i believe. Add the following line to a code that does multiprocessing with the class and you can still pass the method through the pool. the codes should go above the class
for more understanding of how to pickle a method please see below http://docs.python.org/2/library/copy_reg.html
Steven Bethard has posted a way to allow methods to be pickled/unpickled. You could use it like this:
Then
yields