Python package/module lazily loading submodules

2019-04-17 08:18发布

问题:

Interesting usecase today: I need to migrate a module in our codebase following code changes. The old mynamespace.Document will disappear and I want to ensure smooth migration by replacing this package by a code object that will dynamically import the correct path and migrate the corresponding objects.

In short:

# instanciate a dynamic package, but do not load
# statically submodules
mynamespace.Document = SomeObject()
assert 'submodule' not in mynamespace.Document.__dict__

# and later on, when importing it, the submodule
# is built if not already available in __dict__
from namespace.Document.submodule import klass
c = klass()

A few things to note:

  • I am not talking only of migrating code. A simple huge sed would in a sense be enough to change the code in order to migrate some imports, and I would not need a dynamic module. I am talking of objects. A website, holding some live/stored objects will need migration. Those objects will be loaded assuming that mynamespace.Document.submodule.klass exists, and that's the reason for the dynamic module. I need to provide the site with something to load.
  • We cannot, or do not want to change the way objects are unpickled/loaded. For simplicity, let's just say that we want to make sure that the idiom from mynamespace.Document.submodule import klass has to work. I cannot use instead from mynamespace import Document as container; klass = getattr(getattr(container, 'submodule'), 'klass')

What I tried:

import sys
from types import ModuleType

class VerboseModule(ModuleType):
    def __init__(self, name, doc=None):
        super(VerboseModule, self).__init__(name, doc)
        sys.modules[name] = self
    def __repr__(self):
        return "<%s %s>" % (self.__class__.__name__, self.__name__)
    def __getattribute__(self, name):
        if name not in ('__name__', '__repr__', '__class__'):
            print "fetching attribute %s for %s" % (name, self)
        return super(VerboseModule, self).__getattribute__(name)

class DynamicModule(VerboseModule):
    """
    This module generates a dummy class when asked for a component
    """
    def __getattr__(self, name):
        class Dummy(object):
            pass
        Dummy.__name__ = name
        Dummy.__module__ = self
        setattr(self, name, Dummy)
        return Dummy
class DynamicPackage(VerboseModule):
    """
    This package should generate dummy modules
    """
    def __getattr__(self, name):
        mod = DynamicModule("%s.%s" % (self.__name__, name))
        setattr(self, name, mod)
        return mod

DynamicModule("foobar")
# (the import prints:)
# fetching attribute __path__ for <DynamicModule foobar>
# fetching attribute DynamicModuleWorks for <DynamicModule foobar>
# fetching attribute DynamicModuleWorks for <DynamicModule foobar>
from foobar import DynamicModuleWorks
print DynamicModuleWorks

DynamicPackage('document')
# fetching attribute __path__ for <DynamicPackage document>
from document.submodule import ButDynamicPackageDoesNotWork
# Traceback (most recent call last):
# File "dynamicmodule.py", line 40, in <module>
#   from document.submodule import ButDynamicPackageDoesNotWork
#ImportError: No module named submodule

As you can see the Dynamic Package does not work. I do not understand what is happening because document is not even asked for a ButDynamicPackageDoesNotWork attribute.

Can anyone clarify what is happening; and if/how I can fix this?

回答1:

The problem is that python will bypass the entry in for document in sys.modules and load the file for submodule directly. Of course this doesn't exist.

demonstration:

>>> import multiprocessing
>>> multiprocessing.heap = None
>>> import multiprocessing.heap
>>> multiprocessing.heap
<module 'multiprocessing.heap' from '/usr/lib/python2.6/multiprocessing/heap.pyc'>

We would expect that heap is still None because python can just pull it out of sys.modules but That doesn't happen. The dotted notation essentially maps directly to {something on python path}/document/submodule.py and an attempt is made to load that directly.

Update

The trick is to override pythons importing system. The following code requires your DynamicModule class.

import sys

class DynamicImporter(object):
    """this class works as both a finder and a loader."""
    def __init__(self, lazy_packages):
        self.packages = lazy_packages

    def load_module(self, fullname):
        """this makes the class a loader. It is given name of a module and expected
           to return the module object"""
        print "loading {0}".format(fullname)
        components = fullname.split('.')
        components = ['.'.join(components[:i+1])
                      for i in range(len(components))]
        for component in components:
            if component not in sys.modules:
                DynamicModule(component)
                print "{0} created".format(component)
        return sys.modules[fullname]


    def find_module(self, fullname, path=None):
        """This makes the class a finder. It is given the name of a module as well as
           the package that contains it (if applicable). It is expected to return a 
           loader for that module if it knows of one or None in which case other methods
           will be tried"""
        if fullname.split('.')[0] in self.packages:
            print "found {0}".format(fullname)
            return self
        else:
            return None


# This is a list of finder objects which is empty by defaule
# It is tried before anything else when a request to import a module is encountered.
sys.meta_path=[DynamicImporter('foo')]

from foo.bar import ThisShouldWork