Decorators for selective caching / memoization

2020-07-18 02:36发布

问题:

I am looking for a way of building a decorator @memoize that I can use in functions as follows:

@memoize
my_function(a, b, c):
    # Do stuff 
    # result may not always be the same for fixed (a,b,c)
return result

Then, if I do:

result1 = my_function(a=1,b=2,c=3)
# The function f runs (slow). We cache the result for later

result2 = my_function(a=1, b=2, c=3)
# The decorator reads the cache and returns the result (fast)

Now say that I want to force a cache update:

result3 = my_function(a=1, b=2, c=3, force_update=True)
# The function runs *again* for values a, b, and c. 

result4 = my_function(a=1, b=2, c=3)
# We read the cache

At the end of the above, we always have result4 = result3, but not necessarily result4 = result, which is why one needs an option to force the cache update for the same input parameters.

How can I approach this problem?

Note on joblib

As far as I know joblib supports .call, which forces a re-run, but it does not update the cache.

Follow-up on using klepto:

Is there any way to have klepto (see @Wally's answer) cache its results by default under a specific location? (e.g. /some/path/) and share this location across multiple functions? E.g. I would like to say

cache_path = "/some/path/"

and then @memoize several functions in a given module under the same path.

回答1:

I would suggest looking at joblib and klepto. Both have very configurable caching algorithms, and may do what you want.

Both definitely can do the caching for result1 and result2, and klepto provides access to the cache, so one can pop a result from the local memory cache (without removing it from a stored archive, say in a database).

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> hasher = hashmap(algorithm='md5')
>>> @memoize(keymap=hasher)
... def squared(x):
...   print("called")
...   return x**2
... 
>>> squared(1)
called
1
>>> squared(2)
called
4
>>> squared(3)
called
9
>>> squared(2)
4
>>> 
>>> cache = squared.__cache__()
>>> # delete the 'key' for x=2
>>> cache.pop(squared.key(2))
4
>>> squared(2)
called
4

Not exactly the keyword interface you were looking for, but it has the functionality you are looking for.



回答2:

You can do something like this:

import cPickle


def memoize(func):
    cache = {}

    def decorator(*args, **kwargs):
        force_update = kwargs.pop('force_update', None)
        key = cPickle.dumps((args, kwargs))
        if force_update or key not in cache:
            res = func(*args, **kwargs)
            cache[key] = res
        else:
            res = cache[key]
        return res
    return decorator

The decorator accepts extra parameter force_update (you don't need to declare it in your function). It pops it from kwargs. So it you did't call the function with these parameters OR you are passing force_update = True the function will be called:

@memoize
def f(a=0, b=0, c=0):
    import random
    return [a, b, c, random.randint(1, 10)]


>>> print f(a=1, b=2, c=3)
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3, force_update=True)
[1, 2, 3, 2]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache as well
[1, 2, 3, 2]


回答3:

If you want to do it yourself:

def memoize(func):
    cache = {}
    def cacher(a, b, c, force_update=False):
        if force_update or (a, b, c) not in cache:
            cache[(a, b, c)] = func(a, b, c)
        return cache[(a, b, c)]
    return cacher


回答4:

This is purely with regard to the follow-up question for klepto

The flowing will extend @Wally's example to specify a directory:

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> from klepto.archives import dir_archive
>>> hasher = hashmap(algorithm='md5')
>>> dir_cache = dir_archive('/tmp/some/path/squared')
>>> dir_cache2 = dir_archive('/tmp/some/path/tripled')
>>> @memoize(keymap=hasher, cache=dir_cache)
... def squared(x):
...   print("called")
...   return x**2
>>> 
>>> @memoize(keymap=hasher, cache=dir_cache2)
... def tripled(x):
...   print('called')
...   return 3*x
>>>

You could alternately use a file_archive, where you specify the path as:

cache = file_archive('/tmp/some/path/file.py')