Especially when using recursive code there are massive improvements with lru_cache
. I do understand that a cache is a space that stores data that has to be served fast and saves the computer from recomputing.
How does the Python lru_cache
from functools work internally?
I'm Looking for a specific answer, does it use dictionaries like the rest of Python? Does it only store the return
value?
I know that Python is heavily built on top of dictionaries, however, I couldn't find a specific answer to this question. Hopefully, someone can simplify this answer for all the users on StackOverflow.
Source of functools is available here: https://github.com/python/cpython/blob/3.6/Lib/functools.py
Lru_cache use _lru_cache_wrapper
decorator (python decorator with arguments pattern) which have cache
dictionary in context (every decorated function have own cache dict) where it saves return value of called function. Dictionary key is generated with _make_key
function according to arguments. Added some bold comments:
# ACCORDING TO PASSED maxsize ARGUMENT _lru_cache_wrapper
# DEFINES AND RETURNS ONE OF wrapper DECORATORS
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
# Constants shared by all lru cache instances:
sentinel = object() # unique object used to signal cache misses
cache = {} # RESULTS SAVES HERE
cache_get = cache.get # bound method to lookup a key or return None
# ... maxsize is None:
def wrapper(*args, **kwds):
# Simple caching without ordering or size limit
nonlocal hits, misses
key = make_key(args, kwds, typed) # BUILD A KEY FROM ARGUMENTS
result = cache_get(key, sentinel) # TRYING TO GET PREVIOUS CALLS RESULT
if result is not sentinel: # ALREADY CALLED WITH PASSED ARGUMENTS
hits += 1
return result # RETURN SAVED RESULT
# WITHOUT ACTUALLY CALLING FUNCTION
result = user_function(*args, **kwds) # FUNCTION CALL - if cache[key] empty
cache[key] = result # SAVE RESULT
misses += 1
return result
# ...
return wrapper
You can check out the source code here.
Essentially it uses two data structures, a dictionary mapping function parameters to its result, and a linked list to keep track of your function call history.
The cache is essentially implemented using the followings, which is pretty self-explanatory.
cache = {}
cache_get = cache.get
....
make_key = _make_key # build a key from the function arguments
key = make_key(args, kwds, typed)
result = cache_get(key, sentinel)
The gist of updating the linked list is:
elif full:
oldroot = root
oldroot[KEY] = key
oldroot[RESULT] = result
# update the linked list to pop out the least recent function call information
root = oldroot[NEXT]
oldkey = root[KEY]
oldresult = root[RESULT]
root[KEY] = root[RESULT] = None
......