Lazy module variables--can it be done?

2020-02-02 11:02发布

问题:

I'm trying to find a way to lazily load a module-level variable.

Specifically, I've written a tiny Python library to talk to iTunes, and I want to have a DOWNLOAD_FOLDER_PATH module variable. Unfortunately, iTunes won't tell you where its download folder is, so I've written a function that grabs the filepath of a few podcast tracks and climbs back up the directory tree until it finds the "Downloads" directory.

This takes a second or two, so I'd like to have it evaluated lazily, rather than at module import time.

Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?

回答1:

You can't do it with modules, but you can disguise a class "as if" it was a module, e.g., in itun.py, code...:

import sys

class _Sneaky(object):
  def __init__(self):
    self.download = None

  @property
  def DOWNLOAD_PATH(self):
    if not self.download:
      self.download = heavyComputations()
    return self.download

  def __getattr__(self, name):
    return globals()[name]

# other parts of itun that you WANT to code in
# module-ish ways

sys.modules[__name__] = _Sneaky()

Now anybody can import itun... and get in fact your itun._Sneaky() instance. The __getattr__ is there to let you access anything else in itun.py that may be more convenient for you to code as a top-level module object, than inside _Sneaky!_)



回答2:

I used Alex' implementation on Python 3.3, but this crashes miserably: The code

  def __getattr__(self, name):
    return globals()[name]

is not correct because an AttributeError should be raised, not a KeyError. This crashed immediately under Python 3.3, because a lot of introspection is done during the import, looking for attributes like __path__, __loader__ etc.

Here is the version that we use now in our project to allow for lazy imports in a module. The __init__ of the module is delayed until the first attribute access that has not a special name:

""" config.py """
# lazy initialization of this module to avoid circular import.
# the trick is to replace this module by an instance!
# modelled after a post from Alex Martelli :-)

Lazy module variables--can it be done?

class _Sneaky(object):
    def __init__(self, name):
        self.module = sys.modules[name]
        sys.modules[name] = self
        self.initializing = True

    def __getattr__(self, name):
        # call module.__init__ after import introspection is done
        if self.initializing and not name[:2] == '__' == name[-2:]:
            self.initializing = False
            __init__(self.module)
        return getattr(self.module, name)

_Sneaky(__name__)

The module now needs to define an init function. This function can be used to import modules that might import ourselves:

def __init__(module):
    ...
    # do something that imports config.py again
    ...

The code can be put into another module, and it can be extended with properties as in the examples above.

Maybe that is useful for somebody.



回答3:

It turns out that as of Python 3.7, it's possible to do this cleanly by defining a __getattr__() at the module level, as specified in PEP 562.

# mymodule.py

from typing import Any

DOWNLOAD_FOLDER_PATH: str

def _download_folder_path() -> str:
    global DOWNLOAD_FOLDER_PATH
    DOWNLOAD_FOLDER_PATH = ... # compute however ...
    return DOWNLOAD_FOLDER_PATH

def __getattr__(name: str) -> Any:
    if name == "DOWNLOAD_FOLDER_PATH":
        return _download_folder_path()
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")


回答4:

Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?

I think you are correct in saying that a function is the best solution to your problem here. I will give you a brief example to illustrate.

#myfile.py - an example module with some expensive module level code.

import os
# expensive operation to crawl up in directory structure

The expensive operation will be executed on import if it is at module level. There is not a way to stop this, short of lazily importing the entire module!!

#myfile2.py - a module with expensive code placed inside a function.

import os

def getdownloadsfolder(curdir=None):
    """a function that will search upward from the user's current directory
        to find the 'Downloads' folder."""
    # expensive operation now here.

You will be following best practice by using this method.



回答5:

Recently I came across the same problem, and have found a way to do it.

class LazyObject(object):
    def __init__(self):
        self.initialized = False
        setattr(self, 'data', None)

    def init(self, *args):
        #print 'initializing'
        pass

    def __len__(self): return len(self.data)
    def __repr__(self): return repr(self.data)

    def __getattribute__(self, key):
        if object.__getattribute__(self, 'initialized') == False:
            object.__getattribute__(self, 'init')(self)
            setattr(self, 'initialized', True)

        if key == 'data':
            return object.__getattribute__(self, 'data')
        else:
            try:
                return object.__getattribute__(self, 'data').__getattribute__(key)
            except AttributeError:
                return super(LazyObject, self).__getattribute__(key)

With this LazyObject, You can define a init method for the object, and the object will be initialized lazily, example code looks like:

o = LazyObject()
def slow_init(self):
    time.sleep(1) # simulate slow initialization
    self.data = 'done'
o.init = slow_init

the o object above will have exactly the same methods whatever 'done' object have, for example, you can do:

# o will be initialized, then apply the `len` method 
assert len(o) == 4

complete code with tests (works in 2.7) can be found here:

https://gist.github.com/observerss/007fedc5b74c74f3ea08



回答6:

The proper way of doing this, according to the Python docs, is to subclass types.ModuleType and then dynamically update the module's __class__. So, here's a solution loosely on Christian Tismer's answer but probably not resembling it much at all:

import sys
import types

class _Sneaky(types.ModuleType):
    @property
    def DOWNLOAD_FOLDER_PATH(self):
        if not hasattr(self, '_download_folder_path'):
            self._download_folder_path = '/dev/block/'
        return self._download_folder_path
sys.modules[__name__].__class__ = _Sneaky


回答7:

Since Python 3.7 (and as a result of PEP-562), this is now possible with the module-level __getattr__:

Inside your module, put something like:

def _long_function():
    # print() function to show this is called only once
    print("Determining DOWNLOAD_FOLDER_PATH...")
    # Determine the module-level variable
    path = "/some/path/here"
    # Set the global (module scope)
    globals()['DOWNLOAD_FOLDER_PATH'] = path
    # ... and return it
    return path


def __getattr__(name):
    if name == "DOWNLOAD_FOLDER_PATH":
        return _long_function()

    # Implicit else
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

From this it should be clear that the _long_function() isn't executed when you import your module, e.g.:

print("-- before import --")
import somemodule
print("-- after import --")

results in just:

-- before import --
-- after import --

But when you attempt to access the name from the module, the module-level __getattr__ will be called, which in turn will call _long_function, which will perform the long-running task, cache it as a module-level variable, and return the result back to the code that called it.

For example, with the first block above inside the module "somemodule.py", the following code:

import somemodule
print("--")
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')

produces:

--
Determining DOWNLOAD_FOLDER_PATH...
/some/path/here
--
/some/path/here
--

or, more clearly:

# LINE OF CODE                                # OUTPUT
import somemodule                             # (nothing)

print("--")                                   # --

print(somemodule.DOWNLOAD_FOLDER_PATH)        # Determining DOWNLOAD_FOLDER_PATH...
                                              # /some/path/here

print("--")                                   # --

print(somemodule.DOWNLOAD_FOLDER_PATH)        # /some/path/here

print("--")                                   # --

Lastly, you can also implement __dir__ as the PEP describes if you want to indicate (e.g. to code introspection tools) that DOWNLOAD_FOLDER_PATH is available.



回答8:

If that variable lived in a class rather than a module, then you could overload getattr, or better yet, populate it in init.