Serialize SWIG extension with dill

2019-08-10 20:30发布

问题:

Recently, I have been asked to make "our C++ lib work in the cloud". Basically, the lib is computer intensive (calculating prices), so it would make sense. I have constructed a SWIG interface to make a python version with in the mind to use MapReduce with MRJob. I wanted to serialize the objects in a file, and using a mapper, deserialize and calculate the price.

For example:

class MRTest(MRJob):
    def mapper(self,key,value):
        obj = dill.loads(value)
        yield (key, obj.price())

But now I reach a dead end since it seems that dill cannot handle SWIG extension:

PicklingError: Can't pickle <class 'SwigPyObject'>: it's not found as builtins.SwigPyObject

Is there a way to make this work properly?

回答1:

I'm the dill author. That's correct, dill can't pickle C++ objects. When you see it's not found as builtin.some_object… that almost invariably means that you are trying to pickle some object that is not written in python, but uses python to bind to C/C++ (i.e. an extension type). You have no hope of directly pickling such objects with a python serializer.

However, since you are interested in pickling a subclass of an extension type, you can actually do it. All you will need to do is to give your object the appropriate state you want to save as an instance attribute or attributes, and provide a __reduce__ method to tell dill (or pickle) how to save the state of your object. This method is how python deals with serializing extension types. See: https://docs.python.org/2/library/pickle.html#pickling-and-unpickling-extension-types

There are probably better examples, but here's at least one example: https://stackoverflow.com/a/19874769/4646678