I have an object gui_project
which has an attribute .namespace
, which is a namespace dict. (i.e. a dict from strings to objects.)
(This is used in an IDE-like program to let the user define his own object in a Python shell.)
I want to pickle this gui_project
, along with the namespace. Problem is, some objects in the namespace (i.e. values of the .namespace
dict) are not picklable objects. For example, some of them refer to wxPython widgets.
I'd like to filter out the unpicklable objects, that is, exclude them from the pickled version.
How can I do this?
(One thing I tried is to go one by one on the values and try to pickle them, but some infinite recursion happened, and I need to be safe from that.)
(I do implement a GuiProject.__getstate__
method right now, to get rid of other unpicklable stuff besides namespace
.)
This is how I would do this (I did something similar before and it worked):
Now, when you unpickle, you get back all the variables that were originally pickleable. For all variables that were not pickleable, you now have a list of strings (legal python code) that when executed in order, gives you the desired variable.
Hope this helps
The filtering part is indeed tricky. Using simple tricks, you can easily get the pickle to work. However, you might end up filtering out too much and losing information that you could keep when the filter looks a little bit deeper. But the vast possibility of things that can end up in the
.namespace
makes building a good filter difficult.However, we could leverage pieces that are already part of Python, such as
deepcopy
in thecopy
module.I made a copy of the stock
copy
module, and did the following things:LostObject
to represent object that will be lost in pickling._deepcopy_atomic
to make surex
is picklable. If it's not, return an instance ofLostObject
__reduce__
and/or__reduce_ex__
to provide hint about whether and how to pickle it. We make sure these methods will not throw exception to provide hint that it cannot be pickled.The following is the diff:
Now back to the pickling part. You simply make a deepcopy using this new
deepcopy
function and then pickle the copy. The unpicklable parts have been removed during the copying process.Here is the output:
You see that 1) mutual pointers (between
x
andxx
) are preserved and we do not run into infinite loop; 2) the unpicklable file object is converted to aLostObject
instance; and 3) not new copy of the large object is created since it is picklable.I ended up coding my own solution to this, using Shane Hathaway's approach.
Here's the code. (Look for
CutePickler
andCuteUnpickler
.) Here are the tests. It's part of GarlicSim, so you can use it by installinggarlicsim
and doingfrom garlicsim.general_misc import pickle_tools
.If you want to use it on Python 3 code, use the Python 3 fork of
garlicsim
.I would use the pickler's documented support for persistent object references. Persistent object references are objects that are referenced by the pickle but not stored in the pickle.
http://docs.python.org/library/pickle.html#pickling-and-unpickling-external-objects
ZODB has used this API for years, so it's very stable. When unpickling, you can replace the object references with anything you like. In your case, you would want to replace the object references with markers indicating that the objects could not be pickled.
You could start with something like this (untested):
Then just call dump_filtered() and load_filtered() instead of pickle.dump() and pickle.load(). wxPython objects will be pickled as persistent IDs, to be replaced with FilteredObjects at unpickling time.
You could make the solution more generic by filtering out objects that are not of the built-in types and have no
__getstate__
method.Update (15 Nov 2010): Here is a way to achieve the same thing with wrapper classes. Using wrapper classes instead of subclasses, it's possible to stay within the documented API.
One approach would be to inherit from
pickle.Pickler
, and override thesave_dict()
method. Copy it from the base class, which reads like this:However, in the _batch_setitems, pass an iterator that filters out all items that you don't want to be dumped, e.g
As save_dict isn't an official API, you need to check for each new Python version whether this override is still correct.