In the comment: Is there a decorator to simply cache function return values?
@gerrit points out a problem with using mutable, but hashable, objects to to a function with the functools.lru_cache
decorator:
If I pass a hashable, mutable argument, and change the value of the
object after the first call of the function, the second call will
return the changed, not the original, object. That is almost certainly
not what the user wants.
From my understanding, assuming the __hash__()
function of the mutable objects is manually defined to hash the member variables (and not just using the object's id()
that is the default for custom objects), changing the argument object will change the hash, hence, a second call should to the lru_cache
decorated function should not utilize the cache.
If the __hash__()
function is defined correctly for the mutable argument(s), is there any unattended behavior that can arise from using mutable arguments to lru_cache
decorated functions?
My comment was wrong/misleading and does not relate to lru_cache
, but to any attempt to create a caching function that works more generically.
I was facing a need for a caching function that works for function that input and output NumPy
arrays, which are mutable and not hashable. Because NumPy
arrays are not hashable, I could not use functools.lru_cache
. I ended up wroting something like this:
def mutable_cache(maxsize=10):
"""In-memory cache like functools.lru_cache but for any object
This is a re-implementation of functools.lru_cache. Unlike
functools.lru_cache, it works for any objects, mutable or not.
Therefore, it returns a copy and it is wrong if the mutable
object has changed! Use with caution!
If you call the *resulting* function with a keyword argument
'CLEAR_CACHE', the cache will be cleared. Otherwise, cache is rotated
when more than `maxsize` elements exist in the cache. Additionally,
if you call the resulting function with NO_CACHE=True, it doesn't
cache at all. Be careful with functions returning large objects.
Everything is kept in RAM!
Args:
maxsize (int): Maximum number of return values to be remembered.
Returns:
New function that has caching implemented.
"""
sentinel = object()
make_key = functools._make_key
def decorating_function(user_function):
cache = {}
cache_get = cache.get
keylist = [] # don't make it too long
def wrapper(*args, **kwds):
if kwds.get("CLEAR_CACHE"):
del kwds["CLEAR_CACHE"]
cache.clear()
keylist.clear()
if kwds.get("NO_CACHE"):
del kwds["NO_CACHE"]
return user_function(*args, **kwds)
elif "NO_CACHE" in kwds:
del kwds["NO_CACHE"]
key = str(args) + str(kwds)
result = cache_get(key, sentinel)
if result is not sentinel:
# make sure we return a copy of the result; when a = f();
# b = f(), users should reasonably expect that a is not b.
return copy.copy(result)
result = user_function(*args, **kwds)
cache[key] = result
keylist.append(key)
if len(keylist) > maxsize:
try:
del cache[keylist[0]]
del keylist[0]
except KeyError:
pass
return result
return functools.update_wrapper(wrapper, user_function)
return decorating_function
In my first version, I had omitted the copy.copy()
function (which should really be copy.deepcopy()
), which led to bugs if I changed the resulting value and then recalled the cached-function. After I added the copy.copy()
functionality, I realised that I was hogging memory for some cases, primarily because my function counts objects, not total memory usage, which is non-trivial to do in general in Python (although should be easy if limited to NumPy
arrays). Therefore I added the NO_CACHE
and CLEAR_CACHE
keywords to the resulting function, which do what their names suggest.
After writing and using this function, I understand there is more than one good reason for functools.lru_cache
to work only for functions with hashable input arguments. Anyone needing a caching function that works with mutable arguments needs to be very careful.