python threadsafe object cache

2019-02-03 13:38发布

I have implemented a python webserver. Each http request spawns a new thread. I have a requirement of caching objects in memory and since its a webserver, I want the cache to be thread safe. Is there a standard implementatin of a thread safe object cache in python? I found the following

http://freshmeat.net/projects/lrucache/

This does not look to be thread safe. Can anybody point me to a good implementation of thread safe cache in python?

Thanks!

5条回答
smile是对你的礼貌
2楼-- · 2019-02-03 13:54

For a thread safe object you want threading.local:

from threading import local

safe = local()

safe.cache = {}

You can then put and retrieve objects in safe.cache with thread safety.

查看更多
爷、活的狠高调
3楼-- · 2019-02-03 14:01

Point 1. GIL does not help you here, an example of a (non-thread-safe) cache for something called "stubs" would be

stubs = {}

def maybe_new_stub(host):
    """ returns stub from cache and populates the stubs cache if new is created """
    if host not in stubs:
        stub = create_new_stub_for_host(host)
        stubs[host] = stub
    return stubs[host]

What can happen is that Thread 1 calls maybe_new_stub('localhost'), and it discovers we do not have that key in the cache yet. Now we switch to Thread 2, which calls the same maybe_new_stub('localhost'), and it also learns the key is not present. Consequently, both threads call create_new_stub_for_host and put it into the cache.

The map itself is protected by the GIL, so we cannot break it by concurrent access. The logic of the cache, however, is not protected, and so we may end up creating two or more stubs, and dropping all except one on the floor.

Point 2. Depending on the nature of the program, you may not want a global cache. Such shared cache forces synchronization between all your threads. For performance reasons, it is good to make the threads as independent as possible. I believe I do need it, you may actually not.

Point 3. You may use a simple lock. I took inspiration from https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucache and came up with the following, which I believe is safe to use for my purposes

import threading

stubs = {}
lock = threading.Lock()


def maybe_new_stub(host):
    """ returns stub from cache and populates the stubs cache if new is created """
    with lock:
        if host not in stubs:
            channel = grpc.insecure_channel('%s:6666' % host)
            stub = cli_pb2_grpc.BrkStub(channel)
            stubs[host] = stub
        return stubs[host]

Point 4. It would be best to use existing library. I haven't found any I am prepared to vouch for yet.

查看更多
劳资没心,怎么记你
4楼-- · 2019-02-03 14:02

Thread per request is often a bad idea. If your server experiences huge spikes in load it will take the box to its knees. Consider using a thread pool that can grow to a limited size during peak usage and shrink to a smaller size when load is light.

查看更多
Rolldiameter
5楼-- · 2019-02-03 14:16

Well a lot of operations in Python are thread-safe by default, so a standard dictionary should be ok (at least in certain respects). This is mostly due to the GIL, which will help avoid some of the more serious threading issues.

There's a list here: http://coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.html that might be useful.

Though atomic nature of those operation just means that you won't have an entirely inconsistent state if you have two threads accessing a dictionary at the same time. So you wouldn't have a corrupted value. However you would (as with most multi-threading programming) not be able to rely on the specific order of those atomic operations.

So to cut a long story short...

If you have fairly simple requirements and aren't to bothered about the ordering of what get written into the cache then you can use a dictionary and know that you'll always get a consistent/not-corrupted value (it just might be out of date).

If you want to ensure that things are a bit more consistent with regard to reading and writing then you might want to look at Django's local memory cache:

http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py

Which uses a read/write lock for locking.

查看更多
再贱就再见
6楼-- · 2019-02-03 14:18

You probably want to use memcached instead. It's very fast, very stable, very popular, has good python libraries, and will allow you to grow to a distributed cache should you need to:

http://www.danga.com/memcached/

查看更多
登录 后发表回答