可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am looking for a way of building a decorator @memoize
that I can use in functions as follows:
@memoize
my_function(a, b, c):
# Do stuff
# result may not always be the same for fixed (a,b,c)
return result
Then, if I do:
result1 = my_function(a=1,b=2,c=3)
# The function f runs (slow). We cache the result for later
result2 = my_function(a=1, b=2, c=3)
# The decorator reads the cache and returns the result (fast)
Now say that I want to force a cache update:
result3 = my_function(a=1, b=2, c=3, force_update=True)
# The function runs *again* for values a, b, and c.
result4 = my_function(a=1, b=2, c=3)
# We read the cache
At the end of the above, we always have result4 = result3
, but not necessarily result4 = result
, which is why one needs an option to force the cache update for the same input parameters.
How can I approach this problem?
Note on joblib
As far as I know joblib
supports .call
, which forces a re-run, but it does not update the cache.
Follow-up on using klepto
:
Is there any way to have klepto
(see @Wally's answer) cache its results by default under a specific location? (e.g. /some/path/
) and share this location across multiple functions? E.g. I would like to say
cache_path = "/some/path/"
and then @memoize
several functions in a given module under the same path.
回答1:
I would suggest looking at joblib
and klepto
. Both have very configurable caching algorithms, and may do what you want.
Both definitely can do the caching for result1
and result2
, and klepto
provides access to the cache, so one can pop
a result from the local memory cache (without removing it from a stored archive, say in a database).
>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> hasher = hashmap(algorithm='md5')
>>> @memoize(keymap=hasher)
... def squared(x):
... print("called")
... return x**2
...
>>> squared(1)
called
1
>>> squared(2)
called
4
>>> squared(3)
called
9
>>> squared(2)
4
>>>
>>> cache = squared.__cache__()
>>> # delete the 'key' for x=2
>>> cache.pop(squared.key(2))
4
>>> squared(2)
called
4
Not exactly the keyword interface you were looking for, but it has the functionality you are looking for.
回答2:
You can do something like this:
import cPickle
def memoize(func):
cache = {}
def decorator(*args, **kwargs):
force_update = kwargs.pop('force_update', None)
key = cPickle.dumps((args, kwargs))
if force_update or key not in cache:
res = func(*args, **kwargs)
cache[key] = res
else:
res = cache[key]
return res
return decorator
The decorator accepts extra parameter force_update
(you don't need to declare it in your function). It pops it from kwargs
. So it you did't call the function with these parameters OR you are passing force_update = True
the function will be called:
@memoize
def f(a=0, b=0, c=0):
import random
return [a, b, c, random.randint(1, 10)]
>>> print f(a=1, b=2, c=3)
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3, force_update=True)
[1, 2, 3, 2]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache as well
[1, 2, 3, 2]
回答3:
If you want to do it yourself:
def memoize(func):
cache = {}
def cacher(a, b, c, force_update=False):
if force_update or (a, b, c) not in cache:
cache[(a, b, c)] = func(a, b, c)
return cache[(a, b, c)]
return cacher
回答4:
This is purely with regard to the follow-up question for klepto
…
The flowing will extend @Wally's example to specify a directory:
>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> from klepto.archives import dir_archive
>>> hasher = hashmap(algorithm='md5')
>>> dir_cache = dir_archive('/tmp/some/path/squared')
>>> dir_cache2 = dir_archive('/tmp/some/path/tripled')
>>> @memoize(keymap=hasher, cache=dir_cache)
... def squared(x):
... print("called")
... return x**2
>>>
>>> @memoize(keymap=hasher, cache=dir_cache2)
... def tripled(x):
... print('called')
... return 3*x
>>>
You could alternately use a file_archive
, where you specify the path as:
cache = file_archive('/tmp/some/path/file.py')