We are using the following setup: NGINX+Gunicorn+Flask. We need to add just a little bit of caching, no more than 5Mb per Flask worker. SimpleCache seems to be simplest possible solution - it uses memory locally, inside the Python process itself.
Unfortunately, the documentation states the following:
"Simple memory cache for single process environments. This class
exists mainly for the development server and is not 100% thread safe."
However, I fail to see where thread safety would matter at all in our setup. I think that Gunicorn keeps several Flask workers running, and each worker has its own small cache. What can possibly go wrong?
I am currently dealing with a scenario in which after a user logs in the app I want to insert his IP, username into a database.
Now, the way I can do that only once is by using a Cache to store the user`s ip and username in the cache.
Now the problem arises when each gunicorn process initializes its own cache. If add the username+ip combination for proc1's cache if proc2 picks up the next request by the same user it won't find it in its cache and hence add it to its cache and the database again, which is not suitable. Hence, a thread-safe (process-safe) cache is important in this case.
Example logs:
2015-07-07 22:42:31 - myapp.views:29 - DEBUG - not from cache user1100.100.100.100, <type 'unicode'> [14776]
2015-07-07 22:42:31 - myapp.views:30 - DEBUG - from cache : user1100.100.100.100, <type 'unicode'> [14776]
2015-07-07 22:42:40 - myapp.views:29 - DEBUG - not from cache user1100.100.100.100, <type 'unicode'> [14776]
2015-07-07 22:42:40 - myapp.views:30 - DEBUG - from cache : user1100.100.100.100, <type 'unicode'> [14776]
2015-07-07 22:42:41 - myapp.views:29 - DEBUG - not from cache user1100.100.100.100, <type 'unicode'> [14779]
2015-07-07 22:42:41 - myapp.views:30 - DEBUG - from cache : None, <type 'NoneType'> [14779]
2015-07-07 22:42:41 - myapp.views:32 - DEBUG - new username ip [14779]
2015-07-07 22:42:41 - myapp.views:38 - DEBUG - User : user1, ip : 100.100.100.100, noted at time : Tue Jul 7 22:42:41 2015, login_count : None [14779]
You can see gunicorn process 14776 added it to its cache first and the next request was picked by 14776 and hence the database entry happened only once, but after that the next request got picked up by 14779 which add it to its cache and hence the db.
cache = SimpleCache(threshold=1000, default_timeout=3600)
Using a memcache or redis based cache might solve it. I`m experimenting with that myself.
For your use case with gunicorn, there is no multi-threading issue since each service run single-threadedly in its own process. But a potential problem would be "dirty" read of the data.
Think about the following case:
- process1 read from db and populate its own cache, cache1
- process2 read from the same table using the same query and populate its own cache, cache2
- process2 update the table with new data and invalidate old cache2
- process1 execute the same query again, reading from cache1 with the outdated data!. This is when problem happens because process1/cache1 is not aware of the database update