(It is possible to directly jump to the question, further down, and to skip the introduction.)
There is a common difficulty with pickling Python objects from user-defined classes:
# This is program dumper.py
import pickle
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(C(), f)
In fact, trying to get the object back from another program loader.py
with
# This is program loader.py
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
results in
AttributeError: 'module' object has no attribute 'C'
In fact, the class is pickled by name ("C"), and the loader.py
program does not know anything about C
. A common solution consists in importing with
from dumper import C # Objects of class C can be imported
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
However, this solution has a few drawbacks, including the fact that all the classes referenced by the pickled objects have to be imported (there can be many); furthermore, the local namespace becomes polluted by names from the dumper.py
program.
Now, a solution to this consists of fully qualifying objects prior to pickling:
# New dumper.py program:
import pickle
import dumper # This is this very program!
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified class
Unpickling with the original loader.py
program above now works directly (no need to do from dumper import C
).
Question: Now, other classes from dumper.py
seem to be automatically fully qualified upon pickling, and I would love to know how this works, and whether this is a reliable, documented behavior:
import pickle
import dumper # This is this very program!
class D(object): # New class!
pass
class C(object):
def __init__(self):
self.d = D() # *NOT* fully qualified
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified pickle class
Now, unpickling with the original loader.py
program also works (no need to do from dumper import C
); print obj.d
gives a fully qualified class, which I find surprising:
<dumper.D object at 0x122e130>
This behavior is very convenient, since only the top, pickled object has to be fully qualified with the module name (dumper.C()
). But is this behavior reliable and documented? how come that classes are pickled by name ("D") but that the unpickling decides that the pickled self.d
attribute is of class dumper.D
(and not some local D
class)?
PS: The question, refined: I just noticed a few interesting details that might point to an answer to this question:
In the pickling program dumper.py
, print self.d
prints <__main__.D object at 0x2af450>
, with the first dumper.py
program (the one without import dumper
). On the other hand, doing import dumper
and creating the object with dumper.C()
in dumper.py
makes print self.d
print <dumper.D object at 0x2af450>
: the self.d
attribute is automatically qualified by Python! So, it appears that the pickle
module has no role in the nice unpickling behavior described above.
The question is thus really: why does Python convert D()
into the fully qualified dumper.D
, in the second case? is this documented somewhere?