I have taken a database class this semester and we are studying about maintaining cache consistency between the RDBMS and a cache server such as memcached. The consistency issues arise when there are race conditions. For example:
- Suppose I do a
get(key)
from the cache and there is a cache miss. Because I get a cache miss, I fetch the data from the database, and then do aput(key,value)
into the cache. - But, a race condition might happen, where some other user might delete the data I fetched from the database. This delete might happen before I do a
put
into the cache.
Thus, ideally the put
into the cache should not happen, since the data is longer present in the database.
If the cache entry has a TTL, the entry in the cache might expire. But still, there is a window where the data in the cache is inconsistent with the database.
I have been searching for articles/research papers which speak about this kind of issues. But, I could not find any useful resources.
When you read, the following happens:
When you write, the following happens:
This article gives you an interesting note on how Facebook (tries to) maintain cache consistency : http://www.25hoursaday.com/weblog/2008/08/21/HowFacebookKeepsMemcachedConsistentAcrossGeoDistributedDataCenters.aspx
Here's a gist from the article.
How about using a variable save in memcache as a lock signal?
every single memcache command is atomic
after you retrieved data from db, toggle lock on
after you put data to memcache, toggle lock off
before delete from db, check lock state
The code below gives some idea of how to use Memcached's operations
add
,gets
andcas
to implement optimistic locking to ensure consistency of cache with the database.Disclaimer: i do not guarantee that it's perfectly correct and handles all race conditions. Also consistency requirements may vary between applications.