Cache consistency when using memcached and a rdbms

I have taken a database class this semester and we are studying about maintaining cache consistency between the RDBMS and a cache server such as memcached. The consistency issues arise when there are race conditions. For example:

Suppose I do a get(key) from the cache and there is a cache miss. Because I get a cache miss, I fetch the data from the database, and then do a put(key,value) into the cache.
But, a race condition might happen, where some other user might delete the data I fetched from the database. This delete might happen before I do a put into the cache.

Thus, ideally the put into the cache should not happen, since the data is longer present in the database.

If the cache entry has a TTL, the entry in the cache might expire. But still, there is a window where the data in the cache is inconsistent with the database.

I have been searching for articles/research papers which speak about this kind of issues. But, I could not find any useful resources.

标签： caching memcached distributed-computing race-condition consistency

4条回答

三岁会撩人

2楼-- · 2019-03-25 16:40

When you read, the following happens:

if(Key is not in cache){
  fetch data from db
  put(key,value);
}else{
  return get(key)
}

When you write, the following happens:

1 delete/update data from db
2 clear cache

0人赞添加讨论(0) 举报

Summer. ? 凉城

3楼-- · 2019-03-25 16:54

This article gives you an interesting note on how Facebook (tries to) maintain cache consistency : http://www.25hoursaday.com/weblog/2008/08/21/HowFacebookKeepsMemcachedConsistentAcrossGeoDistributedDataCenters.aspx

Here's a gist from the article.

I update my first name from "Jason" to "Monkey"
We write "Monkey" in to the master database in California and delete my first name from memcache in California but not Virginia
Someone goes to my profile in Virginia
We find my first name in memcache and return "Jason"
Replication catches up and we update the slave database with my first name as "Monkey." We also delete my first name from Virginia memcache because that cache object showed up in the replication stream
Someone else goes to my profile in Virginia
We don't find my first name in memcache so we read from the slave and get "Monkey"

0人赞添加讨论(0) 举报

女痞

4楼-- · 2019-03-25 16:54

How about using a variable save in memcache as a lock signal?

every single memcache command is atomic

after you retrieved data from db, toggle lock on

after you put data to memcache, toggle lock off

before delete from db, check lock state

0人赞添加讨论(0) 举报

淡お忘

5楼-- · 2019-03-25 17:02

The code below gives some idea of how to use Memcached's operations add, gets and cas to implement optimistic locking to ensure consistency of cache with the database.
Disclaimer: i do not guarantee that it's perfectly correct and handles all race conditions. Also consistency requirements may vary between applications.

def read(k):
  loop:
    get(k)
    if cache_value == 'updating':
      handle_too_many_retries()
      sleep()
      continue
    if cache_value == None:
      add(k, 'updating')
      gets(k)
      get_from_db(k)
      if cache_value == 'updating':
        cas(k, 'value:' + version_index(db_value) + ':' + extract_value(db_value))
      return db_value
    return extract_value(cache_value)

def write(k, v):
  set_to_db(k, v)
  loop:
    gets(k)
    if cache_value != 'updated' and cache_value != None and version_index(cache_value) >= version_index(db_value):
      break
    if cas(k, v):
      break
    handle_too_many_retries()

# for deleting we can use some 'tumbstone' as a cache value

0人赞添加讨论(0) 举报

Cache consistency when using memcached and a rdbms

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间