How does Lru_cache (from functools) Work?

2020-02-17 06:51发布

问题:

Especially when using recursive code there are massive improvements with lru_cache. I do understand that a cache is a space that stores data that has to be served fast and saves the computer from recomputing.

How does the Python lru_cache from functools work internally?

I'm Looking for a specific answer, does it use dictionaries like the rest of Python? Does it only store the return value?

I know that Python is heavily built on top of dictionaries, however, I couldn't find a specific answer to this question. Hopefully, someone can simplify this answer for all the users on StackOverflow.

回答1:

Source of functools is available here: https://github.com/python/cpython/blob/3.6/Lib/functools.py

Lru_cache use _lru_cache_wrapper decorator (python decorator with arguments pattern) which have cache dictionary in context (every decorated function have own cache dict) where it saves return value of called function. Dictionary key is generated with _make_key function according to arguments. Added some bold comments:

# ACCORDING TO PASSED maxsize ARGUMENT _lru_cache_wrapper
# DEFINES AND RETURNS ONE OF wrapper DECORATORS

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # Constants shared by all lru cache instances:
    sentinel = object()      # unique object used to signal cache misses

    cache = {}                                # RESULTS SAVES HERE
    cache_get = cache.get    # bound method to lookup a key or return None

    # ... maxsize is None:

    def wrapper(*args, **kwds):
        # Simple caching without ordering or size limit
        nonlocal hits, misses
        key = make_key(args, kwds, typed)     # BUILD A KEY FROM ARGUMENTS
        result = cache_get(key, sentinel)     # TRYING TO GET PREVIOUS CALLS RESULT
        if result is not sentinel:            # ALREADY CALLED WITH PASSED ARGUMENTS
            hits += 1
            return result                     # RETURN SAVED RESULT
                                              # WITHOUT ACTUALLY CALLING FUNCTION
        result = user_function(*args, **kwds) # FUNCTION CALL - if cache[key] empty
        cache[key] = result                   # SAVE RESULT
        misses += 1
        return result
    # ...

    return wrapper


回答2:

You can check out the source code here.

Essentially it uses two data structures, a dictionary mapping function parameters to its result, and a linked list to keep track of your function call history.

The cache is essentially implemented using the followings, which is pretty self-explanatory.

cache = {}
cache_get = cache.get
....
make_key = _make_key         # build a key from the function arguments
key = make_key(args, kwds, typed)
result = cache_get(key, sentinel)

The gist of updating the linked list is:

elif full:

    oldroot = root
    oldroot[KEY] = key
    oldroot[RESULT] = result

    # update the linked list to pop out the least recent function call information        
    root = oldroot[NEXT]
    oldkey = root[KEY]
    oldresult = root[RESULT]
    root[KEY] = root[RESULT] = None
    ......