How to get around the pickling error of python mul

2019-03-28 06:03发布

问题:

I've researched this question multiple times, but haven't found a workaround that either works in my case, or one that I understand, so please bear with me.

Basically, I have a hierarchical organization of functions, and that is preventing me from multiprocessing in the top-level. Unfortunately, I don't believe I can change the layout of the program - because I need all the variables that I create after the initial inputs.

For example, say I have this:

import multiprocessing

  def calculate(x):
    # here is where I would take this input x (and maybe a couple more inputs)
    # and build a larger library of variables that I use further down the line

    def domath(y):
      return x * y

    pool = multiprocessing.Pool(3)
    final= pool.map(domath, range(3))

calculate(2)

This yields the following error:

Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

I was thinking of globals, but I'm afraid that I'd have to define too many and that may slow my program down quite a bit. Is there any workaround without having to restructure the whole program?

回答1:

You could use pathos.multiprocessing, which is a fork of multiprocessing that uses the dill serializer instead of pickle. dill can serialize pretty much anything in python. Then, no need to edit your code.

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> def calculate(x):
...   def domath(y):
...     return x*y
...   return Pool().map(domath, range(3))
... 
>>> calculate(2)
[0, 2, 4]

You can even go nuts with it… as most things are pickled. No need for the odd non-pythonic solutions you have to cook up with pure multiprocessing.

>>> class Foo(object):
...   def __init__(self, x):
...     self.x = x
...   def doit(self, y):
...     return ProcessingPool().map(self.squared, calculate(y+self.x))
...   def squared(self, z):
...     return z*z
... 
>>> def thing(obj, y):
...   return getattr(obj, 'doit')(y)
... 
>>> ProcessingPool().map(thing, ProcessingPool().map(Foo, range(3)), range(3))
[[0, 0, 0], [0, 4, 16], [0, 16, 64]]

Get pathos here: https://github.com/uqfoundation



回答2:

The problem you encountered is actually a feature. The pickle source is actually designed to prevent this sort of behavior in order to prevent malicious code from being executed. Please consider that when addressing any applicable security implementation.

First off we have some imports.

import marshal
import pickle
import types

Here we have a function which takes in a function as an argument, pickles the parts of the object, then returns a tuple containing all the parts:

def pack(fn):
    code = marshal.dumps(fn.__code__)
    name = pickle.dumps(fn.__name__)
    defs = pickle.dumps(fn.__defaults__)
    clos = pickle.dumps(fn.__closure__)
    return (code, name, defs, clos)

Next we have a function which takes the four parts of our converted function. It translates those four parts, and creates then returns a function out of those parts. You should take note that globals are re-introduced into here because our process does not handle those:

def unpack(code, name, defs, clos):
    code = marshal.loads(code)
    glob = globals()
    name = pickle.loads(name)
    defs = pickle.loads(defs)
    clos = pickle.loads(clos)
    return types.FunctionType(code, glob, name, defs, clos)

Here we have a test function. Notice I put an import within the scope of the function. Globals are not handled through our pickling process:

def test_function(a, b):
    from random import randint
    return randint(a, b)

Finally we pack our test object and print the result to make sure everything is working:

packed = pack(test_function)
print((packed))

Lastly, we unpack our function, assign it to a variable, call it, and print its output:

unpacked = unpack(*packed)
print((unpacked(2, 20)))

Comment if you have any questions.



回答3:

How about just taking the embedded function out?

This seems to me the clearest solution (since you didn't give your expected output, I had to guess):

$ cat /tmp/tmp.py
import multiprocessing

def calculate(x):
    # here is where I would take this input x (and maybe a couple more inputs)
    # and build a larger library of variables that I use further down the line

    pool = multiprocessing.Pool(3)
    _lst = [(x, y) for x in (x,) for y in range(3)]
    final= pool.map(domath, _lst)
    print(final)

def domath(l):
    return l[0] * l[1]

calculate(2)

$ python /tmp/tmp.py
[0, 2, 4]

$