Store object using Python pickle, and load it into

2020-03-13 05:29发布

问题:

I'd like to pass object state between two Python programs (one is my own code running standalone, one is a Pyramid view), and different namespaces. Somewhat related questions are here or here, but I can't quite follow through with them for my scenario.

My own code defines a global class (i.e. __main__ namespace) of somewhat complexish structure:

# An instance of this is a colorful mess of nested lists and sets and dicts.
class MyClass :
    def __init__(self) :
        data = set()
        more = dict()
        ... 

    def do_sth(self) :
        ...

At some point I pickle an instance of this class:

c = MyClass()
# Fill c with data.

# Pickle and write the MyClass instance within the __main__ namespace.
with open("my_c.pik", "wb") as f :
    pickle.dump(c, f, -1)

A hexdump -C my_c.pik shows that the first couple of bytes contain __main__.MyClass from which I assume that the class is indeed defined in the global namespace, and that this is somehow a requirement for reading the pickle. Now I'd like to load this pickled MyClass instance from within a Pyramid view, which I assume is a different namespace:

# In Pyramid (different namespace) read the pickled MyClass instance.
with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

But that results in the following error:

File ".../views.py", line 60, in view_handler_bla
  c = pickle.load(f)
AttributeError: 'module' object has no attribute 'MyClass'

It seems to me that the MyClass definition is missing in whatever namespace the view code executes? I had hoped (assumed) that pickling is a somewhat opaque process which allows me to read a blob of data into whichever place I chose. (More on Python's class names and namespaces is here.)

How can I handle this properly? (Ideally without having to import stuff across...) Can I somehow find the current namespace and inject MyClass (like this answer seems to suggest)?

Poor Solution

It seems to me that if I refrain from defining and using MyClass and instead fall back to plain built-in datatypes, this wouldn't be a problem. In fact, I could "serialize" the MyClass object into a sequence of calls that pickle the individual elements of the MyClass instance:

# 'Manual' serialization of c works, because all elements are built-in types.
pickle.dump(c.data, f, -1)
pickle.dump(c.more, f, -1)
...

This would defeat the purpose of wrapping data into classes though.

Note

Pickling takes care only of the state of a class, not of any functions defined in the scope of the class (e.g. do_sth() in the above example). That means that loading a MyClass instance into a different namespace without the proper class definition loads only the instance data; calling a missing function like do_sth() will cause an AttributeError.

回答1:

Use dill instead of pickle, because dill by default pickles by serializing the class definition and not by reference.

>>> import dill
>>> class MyClass:
...   def __init__(self): 
...     self.data = set()
...     self.more = dict()
...   def do_stuff(self):
...     return sorted(self.more)
... 
>>> c = MyClass()
>>> c.data.add(1)
>>> c.data.add(2)
>>> c.data.add(3)
>>> c.data
set([1, 2, 3])
>>> c.more['1'] = 1
>>> c.more['2'] = 2
>>> c.more['3'] = lambda x:x
>>> def more_stuff(self, x):  
...   return x+1
... 
>>> c.more_stuff = more_stuff
>>> 
>>> with open('my_c.pik', "wb") as f:
...   dill.dump(c, f)
... 
>>> 

Shut down the session, and restart in a new session…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('my_c.pik', "rb") as f:
...   c = dill.load(f)
... 
>>> c.data
set([1, 2, 3])
>>> c.more
{'1': 1, '3': <function <lambda> at 0x10473ec80>, '2': 2}
>>> c.do_stuff()
['1', '2', '3']
>>> c.more_stuff(5)
6

Get dill here: https://github.com/uqfoundation/dill



回答2:

Solution 1

On pickle.load, the module __main__ needs to have a function or class called MyClass. This does not need to be the original class with the original source code. You can put other methods in it. It should work.

class MyClass(object):
    pass

with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

Solution 2

Use the copyreg module which is used to register constructors and pickle functions to pickle specific objects. This is the example given by the module for a complex number:

def pickle_complex(c):
    return complex, (c.real, c.imag)

copyreg.pickle(complex, pickle_complex, complex)

Solution 3

Override the persistent_id method of the Pickler and Unpickler. pickler.persistent_id(obj) shall return an identifier that can be resolved by unpickler.persistent_id(id) to the object.



回答3:

The easiest solution is to use cloudpickle:

https://github.com/cloudpipe/cloudpickle

It enabled me to easily send a pickled class file to another machine and unpickle it using cloudpickle again.