What can be done to speed up this memoization deco

2019-03-22 02:51发布

问题:

What I want is a memoization decorator that:

  • can memoize instance methods with both arguments and keyword arguments
  • has a cache that can be cleared (globally) with one call (vs. this one that uses a per-function cache: python resettable instance method memoization decorator)
  • is reasonably efficient

I've tweaked an example I saw and came up with the following:

import functools

class Memoized(object):
  """Decorator that caches a function's return value each time it is called.
  If called later with the same arguments, the cached value is returned, and
  not re-evaluated.
  """

  __cache = {}

  def __init__(self, func):
    self.func = func
    self.key = (func.__module__, func.__name__)

  def __call__(self, *args):
    try:
      return Memoized.__cache[self.key][args]
    except KeyError:
      value = self.func(*args)
      Memoized.__cache[self.key] = {args : value}
      return value
    except TypeError:
      # uncachable -- for instance, passing a list as an argument.
      # Better to not cache than to blow up entirely.
      return self.func(*args)

  def __get__(self, obj, objtype):
    """Support instance methods."""
    return functools.partial(self.__call__, obj)

  @staticmethod
  def reset():
    Memoized.__cache = {}

My problem with it is that the caching part seems to involve a lot of overhead (eg. for recursive functions). Using the following function as an example, I can call fib(30) ten times with the non-memoized version in less time than the memoized version.

def fib(n):

   if n in (0, 1):
      return n
   return fib(n-1) + fib(n-2)

Can anyone suggest a better way to write this decorator? (or point me to a better (ie. faster) decorator that does what I want). I'm not interested in preserving method signatures, or helping introspection tools "know" anything about the decorated function.

Thanks.

P.S. Using python 2.7

回答1:

You're not actually caching any data, because each time you set a new cached value you overwrite the previous:

Memoized.__cache[self.key] = {args : value}

eg.

import functools

class Memoized(object):
    """Decorator that caches a function's return value each time it is called.
    If called later with the same arguments, the cached value is returned, and
    not re-evaluated.
    """

    cache = {}

    def __init__(self, func):
        self.func = func
        self.key = (func.__module__, func.__name__)
        self.cache[self.key] = {}

    def __call__(self, *args):
      try:
          return Memoized.cache[self.key][args]
      except KeyError:
          value = self.func(*args)
          Memoized.cache[self.key][args] = value
          return value
      except TypeError:
          # uncachable -- for instance, passing a list as an argument.
          # Better to not cache than to blow up entirely.
          return self.func(*args)

    def __get__(self, obj, objtype):
        """Support instance methods."""
        return functools.partial(self.__call__, obj)

    @staticmethod
    def reset():
        Memoized.cache = {}
  • fib(30) without caching: 2.86742401123
  • fib(30) with caching: 0.000198125839233

Some other notes:

  • Don't use __prefix; there's no reason to here and it only uglifies the code.
  • Instead of using a single, monolithic, class-attribute dict, give each instance of Memoized its own dict, and keep a registry of Memoized objects. This improves encapsulation, and removes the oddity of depending on the module and function names.

.

import functools
import weakref

class Memoized(object):
    """Decorator that caches a function's return value each time it is called.
    If called later with the same arguments, the cached value is returned, and
    not re-evaluated.

    >>> counter = 0
    >>> @Memoized
    ... def count():
    ...     global counter
    ...     counter += 1
    ...     return counter

    >>> counter = 0
    >>> Memoized.reset()
    >>> count()
    1
    >>> count()
    1
    >>> Memoized.reset()
    >>> count()
    2

    >>> class test(object):
    ...     @Memoized
    ...     def func(self):
    ...         global counter
    ...         counter += 1
    ...         return counter
    >>> testobject = test()
    >>> counter = 0
    >>> testobject.func()
    1
    >>> testobject.func()
    1
    >>> Memoized.reset()
    >>> testobject.func()
    2
    """

    caches = weakref.WeakSet()

    def __init__(self, func):
        self.func = func
        self.cache = {}
        Memoized.caches.add(self)

    def __call__(self, *args):
      try:
          return self.cache[args]
      except KeyError:
          value = self.func(*args)
          self.cache[args] = value
          return value
      except TypeError:
          # uncachable -- for instance, passing a list as an argument.
          # Better to not cache than to blow up entirely.
          return self.func(*args)

    def __get__(self, obj, objtype):
        """Support instance methods."""
        return functools.partial(self.__call__, obj)

    @staticmethod
    def reset():
        for memo in Memoized.caches:
            memo.cache = {}

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Edited: add tests, and use weakref.WeakSet. Note that WeakSet is only available in 2.7 (which the OP is using); for a version that works in 2.6, see the edit history.



回答2:

Here is a version that is significantly faster. Unfortunately reset can no longer actually completely clear the cache since all the instances are storing their own local copy of the reference to the per-function dictionary. Though you can sort of get it to work:

import functools

class Memoized(object):
  """Decorator that caches a function's return value each time it is called.
  If called later with the same arguments, the cached value is returned, and
  not re-evaluated.
  """

  __cache = {}

  def __init__(self, func):
    self.func = func
    key = (func.__module__, func.__name__)
    if key not in self.__cache:
      self.__cache[key] = {}
    self.mycache = self.__cache[key]

  def __call__(self, *args):
    try:
      return self.mycache[args]
    except KeyError:
      value = self.func(*args)
      self.mycache[args] = value
      return value
    except TypeError:
      # uncachable -- for instance, passing a list as an argument.
      # Better to not cache than to blow up entirely.
      return self.func(*args)

  def __get__(self, obj, objtype):
    """Support instance methods."""
    return functools.partial(self.__call__, obj)

  @classmethod
  def reset(cls):
    for v in cls.__cache.itervalues():
      v.clear()