Django caching a large list

2019-04-08 20:37发布

My django application deals with 25MB binary files. Each of them has about 100,000 "records" of 256 bytes each.

It takes me about 7 seconds to read the binary file from disk and decode it using python's struct module. I turn the data into a list of about 100,000 items, where each item is a dictionary with values of various types (float, string, etc.).

My django views need to search through this list. Clearly 7 seconds is too long.

I've tried using django's low-level caching API to cache the whole list, but that won't work because there's a maximum size limit of 1MB for any single cached item. I've tried caching the 100,000 list items individually, but that takes a lot more than 7 seconds - most of the time is spent unpickling the items.

Is there a convenient way to store a large list in memory between requests? Can you think of another way to cache the object for use by my django app?

2条回答
干净又极端
2楼-- · 2019-04-08 21:29

edit the item size limit to be 10m (larger than 1m), add

-I 10m

to /etc/memcached.conf and restart memcached

also edit this class in memcached.py located in /usr/lib/python2.7/dist-packages/django/core/cache/backends to look like this:

class MemcachedCache(BaseMemcachedCache):
"An implementation of a cache binding using python-memcached"
def __init__(self, server, params):
    import memcache
    memcache.SERVER_MAX_VALUE_LENGTH = 1024*1024*10 #added limit to accept 10mb
    super(MemcachedCache, self).__init__(server, params,
                                         library=memcache,
                                         value_not_found_exception=ValueError)
查看更多
可以哭但决不认输i
3楼-- · 2019-04-08 21:34

I'm not able to add comments yet, but I wanted to share my quick fix around this problem, since I had the same problem with python-memcached behaving strangely when you change the SERVER_MAX_VALUE_LENGTH at import time.

Well, besides the __init__ edit that FizxMike suggests you can also edit the _cache property in the same class. Doing so you can instantiate the python-memcached Client passing the server_max_value_length explicitly, like this:

from django.core.cache.backends.memcached import BaseMemcachedCache

DEFAULT_MAX_VALUE_LENGTH = 1024 * 1024

class MemcachedCache(BaseMemcachedCache):
    def __init__(self, server, params):
        #options from the settings['CACHE'][connection]
        self._options = params.get("OPTIONS", {})
        import memcache
        memcache.SERVER_MAX_VALUE_LENGTH = self._options.get('SERVER_MAX_VALUE_LENGTH', DEFAULT_MAX_VALUE_LENGTH)

        super(MemcachedCache, self).__init__(server, params,
                                             library=memcache,
                                             value_not_found_exception=ValueError)

    @property
    def _cache(self):
        if getattr(self, '_client', None) is None:
            server_max_value_length = self._options.get("SERVER_MAX_VALUE_LENGTH", DEFAULT_MAX_VALUE_LENGTH)
            #one could optionally send more parameters here through the options settings,
            #I simplified here for brevity
            self._client = self._lib.Client(self._servers,
                server_max_value_length=server_max_value_length)

        return self._client

I also prefer to create another backend that inherits from BaseMemcachedCache and use it instead of editing django code.

here's the django memcached backend module for reference: https://github.com/django/django/blob/master/django/core/cache/backends/memcached.py

Thanks for all the help on this thread!

查看更多
登录 后发表回答