How does the automatic full qualification of class

2019-04-10 08:51发布

问题:

(It is possible to directly jump to the question, further down, and to skip the introduction.)

There is a common difficulty with pickling Python objects from user-defined classes:

# This is program dumper.py
import pickle

class C(object):
    pass

with open('obj.pickle', 'wb') as f:
    pickle.dump(C(), f)

In fact, trying to get the object back from another program loader.py with

# This is program loader.py
with open('obj.pickle', 'rb') as f:
    obj = pickle.load(f)

results in

AttributeError: 'module' object has no attribute 'C'

In fact, the class is pickled by name ("C"), and the loader.py program does not know anything about C. A common solution consists in importing with

from dumper import C  # Objects of class C can be imported

with open('obj.pickle', 'rb') as f:
    obj = pickle.load(f)

However, this solution has a few drawbacks, including the fact that all the classes referenced by the pickled objects have to be imported (there can be many); furthermore, the local namespace becomes polluted by names from the dumper.py program.

Now, a solution to this consists of fully qualifying objects prior to pickling:

# New dumper.py program:
import pickle
import dumper  # This is this very program!

class C(object):
    pass

with open('obj.pickle', 'wb') as f:
    pickle.dump(dumper.C(), f)  # Fully qualified class

Unpickling with the original loader.py program above now works directly (no need to do from dumper import C).

Question: Now, other classes from dumper.py seem to be automatically fully qualified upon pickling, and I would love to know how this works, and whether this is a reliable, documented behavior:

import pickle
import dumper  # This is this very program!

class D(object):  # New class!
    pass

class C(object):
    def __init__(self):
        self.d = D()  # *NOT* fully qualified

with open('obj.pickle', 'wb') as f:
    pickle.dump(dumper.C(), f)  # Fully qualified pickle class

Now, unpickling with the original loader.py program also works (no need to do from dumper import C); print obj.d gives a fully qualified class, which I find surprising:

<dumper.D object at 0x122e130>

This behavior is very convenient, since only the top, pickled object has to be fully qualified with the module name (dumper.C()). But is this behavior reliable and documented? how come that classes are pickled by name ("D") but that the unpickling decides that the pickled self.d attribute is of class dumper.D (and not some local D class)?

PS: The question, refined: I just noticed a few interesting details that might point to an answer to this question:

In the pickling program dumper.py, print self.d prints <__main__.D object at 0x2af450>, with the first dumper.py program (the one without import dumper). On the other hand, doing import dumper and creating the object with dumper.C() in dumper.py makes print self.d print <dumper.D object at 0x2af450>: the self.d attribute is automatically qualified by Python! So, it appears that the pickle module has no role in the nice unpickling behavior described above.

The question is thus really: why does Python convert D() into the fully qualified dumper.D, in the second case? is this documented somewhere?

回答1:

When your classes are defined in your main module, that's where pickle expects to find them when they are unpickled. In your first case, the classes were defined in the main module, so when loader runs, loader is the main module and pickle can't find the classes. If you look at the content of obj.pickle, you'll see then name __main__ exported as the namespace of your C and D classes.

In your second case, dumper.py imports itself. Now you actually have two separate sets of C and D classes defined: one set in __main__ namespace and one set in dumper namespace. You serialize the one in the dumper namespace (look in obj.pickle to verify).

pickle will attempt to dynamically import a namespace if it is not found, so when loader.py runs pickle itself imports dumper.py and the dumper.C and dumper.D classes.

Since you have two separate scripts, dumper.py and loader.py, it only makes sense to define the classes they share in a common import module:

common.py

class D(object):
    pass

class C(object):
    def __init__(self):
        self.d = D()

loader.py

import pickle

with open('obj.pickle','rb') as f:
    obj = pickle.load(f)

print obj

dumper.py

import pickle
from common import C

with open('obj.pickle','wb') as f:
    pickle.dump(C(),f)

Note that even though dumper.py dumps C() in this case pickle knows that it is a common.C object (see obj.pickle). When loader.py runs, it will dynamically import common.py and succeed loading the object.



回答2:

Here is what happens: when importing dumper (or doing from dumper import C) from within dumper.py, the whole program is parsed again (this can be seen by inserting a print in the module). This behavior is expected, because dumper is not a module that was already loaded (__main__ is considered loaded, however)–it is not in sys.modules.

As illustrated in Mark's answer, importing a module naturally qualifies all the names defined in the module, so that self.d = D() is interpreted as being of class dumper.D when re-evaluating file dumper.py (this is equivalent to parsing common.py, in Mark's answer).

Thus, the import dumper (or from dumper import C) trick is explained, and pickling fully qualifies not only class C but also class D. This makes unpickling by an external program easier!

This also shows that import dumper done in dumper.py forces the Python interpreter to parse the program twice, which is neither efficient nor elegant. Pickling classes in a program and unpickling them in another one is therefore probably best done through the approach outlined in Mark's answer: pickled classes should be in a separate module.