Invalidating a path from the Django cache recursiv

2019-01-31 22:14发布

问题:

I am deleting a single path from the Django cache like this:

from models                   import Graph
from django.http              import HttpRequest
from django.utils.cache       import get_cache_key
from django.db.models.signals import post_save
from django.core.cache        import cache

def expire_page(path):
    request      = HttpRequest()
    request.path = path
    key          = get_cache_key(request)
    if cache.has_key(key):   
        cache.delete(key)

def invalidate_cache(sender, instance, **kwargs):
    expire_page(instance.get_absolute_url())

post_save.connect(invalidate_cache, sender = Graph)

This works - but is there a way to delete recursively? My paths look like this:

/graph/123
/graph/123/2009-08-01/2009-10-21

Whenever the graph with id "123" is saved, the cache for both paths needs to be invalidated. Can this be done?

回答1:

You might want to consider employing a generational caching strategy, it seems like it would fit what you are trying to accomplish. In the code that you have provided, you would store a "generation" number for each absolute url. So for example you would initialize the "/graph/123" to have a generation of one, then its cache key would become something like "/GENERATION/1/graph/123". When you want to expire the cache for that absolute url you increment its generation value (to two in this case). That way, the next time someone goes to look up "/graph/123" the cache key becomes "/GENERATION/2/graph/123". This also solves the issue of expiring all the sub pages since they should be referring to the same cache key as "/graph/123".

Its a bit tricky to understand at first but it is a really elegant caching strategy which if done correctly means you never have to actually delete anything from cache. For more information here is a presentation on generational caching, its for Rails but the concept is the same, regardless of language.



回答2:

Another option is to use a cache that supports tagging keys and evicting keys by tag. Django's built-in cache API does not have support for this approach. But at least one cache backend (not part of Django proper) does have support.

DiskCache* is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. To use DiskCache in your project simply install it and configure your CACHES setting.

Installation is easy with pip:

$ pip install diskcache

Then configure your CACHES setting:

CACHES = {
    'default': {
        'BACKEND': 'diskcache.DjangoCache',
        'LOCATION': '/tmp/path/to/directory/',
    }
}

The cache set method is extended by an optional tag keyword argument like so:

from django.core.cache import cache

cache.set('/graph/123', value, tag='/graph/123')
cache.set('/graph/123/2009-08-01/2009-10-21', other_value, tag='/graph/123')

diskcache.DjangoCache uses a diskcache.FanoutCache internally. The corresponding FanoutCache is accessible through the _cache attribute and exposes an evict method. To evict all keys tagged with /graph/123 simply:

cache._cache.evict('/graph/123')

Though it may feel awkward to access an underscore-prefixed attribute, the DiskCache project is stable and unlikely to make significant changes to the DjangoCache implementation.

The Django cache benchmarks page has a discussion of alternative cache backends.

  • Disclaimer: I am the original author of the DiskCache project.


回答3:

Checkout shutils.rmtree() or os.removedirs(). I think the first is probably what you want.

Update based on several comments: Actually, the Django caching mechanism is more general and finer-grained than just using the path for the key (although you can use it at that level). We have some pages that have 7 or 8 separately cached subcomponents that expire based on a range of criteria. Our component cache names reflect the key objects (or object classes) and are used to identify what needs to be invalidated on certain updates.

All of our pages have an overall cache-key based on member/non-member status, but that is only about 95% of the page. The other 5% can change on a per-member basis and so is not cached at all.

How you iterate through your cache to find invalid items is a function of how it's actually stored. If it's files you can use simply globs and/or recursive directory deletes, if it's some other mechanism then you'll have to use something else.

What my answer, and some of the comments by others, are trying to say is that how you accomplish cache invalidation is intimately tied to how you are using/storing the cache.

Second Update: @andybak: So I guess your comment means that all of my commercial Django sites are going to explode in flames? Thanks for the heads up on that. I notice you did not attempt an answer to the problem.

Knipknap's problem is that he has a group of cache items that appear to be related and in a hierarchy because of their names, but the key-generation logic of the cache mechanism obliterates that name by creating an MD5 hash of the path + vary_on. Since there is no trace of the original path/params you will have to exhaustively guess all possible path/params combinations, hoping you can find the right group. I have other hobbies that are more interesting.

If you wish to be able to find groups of cached items based on some combination of path and/or parameter values you must either use cache keys that can be pattern matched directly or some system that retains this information for use at search time.

Because we had needs not-unrelated to the OP's problem, we took control of template fragment caching -- and specifically key generation -- over 2 years ago. It allows us to use regexps in a number of ways to efficiently invalidate groups of related cached items. We also added a default timeout and vary_on variable names (resolved at run time) configurable in settings.py, changed the ordering of name & timeout because it made no sense to always have to override the default timeout in order to name the fragment, made the fragment_name resolvable (ie. it can be a variable) to work better with a multi-level template inheritance scheme, and a few other things.

The only reason for my initial answer, which was indeed wrong for current Django, was because I have been using saner cache keys for so long I literally forgot the simple mechanism we walked away from.