I have a database with files which can be searched, browsed and have multiple copies on multiple servers.
I cache searches, browse pages and server locations (urls). Say I delete a file, what's a good way to invalidate all searches, browse data and urls for this file? Or if a file server goes down, and I need to invalidate all urls pointing to this server?
Essentially I'm looking for something similar to memcache-tags, but with standard memcache and php components. (Without having to change anything on the web server itself). I need some sort of many to many relation (one server has many files, and one file has multiple servers) in between keys, but can't seem to figure out a good way to accomplish this. In some situations stale cache is acceptable (minor updates etc.), but in some cases it's not (typically delete, and server down) where I need to invalidate all cache items containing references to it.
Some approaches I have looked at:
$ns_key = $memcache->get("foo_namespace_key");
// if not set, initialize it
if($ns_key===false) $memcache->set("foo_namespace_key", rand(1, 10000));
// cleverly use the ns_key
$my_key = "foo_".$ns_key."_12345";
$my_val = $memcache->get($my_key);
//To clear the namespace do:
$memcache->increment("foo_namespace_key");
- Restricts the cache key to a single namespace
$files = array('file1','file2');
// Cache all files as single entries
foreach ($files as $file) {
$memcache->set($file.'_key');
}
$search = array('file1_key','file2_key');
// Retrieve all items found by search (typically cached as file ids)
foreach ($search as $item) {
$memcache->get($item);
}
- Gives a problem if a file server is down, and all keys containing urls to this server should be invalidated (E.G large number of small cache items would be needed which in turn requires a large amount of requests against cache)- breaks any chance of caching full objects and resultsets
class KeyEnabled_Memcached extends Zend_Cache_Backend_Memcached
{
private function getTagListId()
{
return "MyTagArrayCacheKey";
}
private function getTags()
{
if(!$tags = $this->_memcache->get($this->getTagListId()))
{
$tags = array();
}
return $tags;
}
private function saveTags($id, $tags)
{
// First get the tags
$siteTags = $this->getTags();
foreach($tags as $tag)
{
$siteTags[$tag][] = $id;
}
$this->_memcache->set($this->getTagListId(), $siteTags);
}
private function getItemsByTag($tag)
{
$siteTags = $this->_memcache->get($this->getTagListId());
return isset($siteTags[$tag]) ? $siteTags[$tag] : false;
}
/**
* Save some string datas into a cache record
*
* Note : $data is always "string" (serialization is done by the
* core not by the backend)
*
* @param string $data Datas to cache
* @param string $id Cache id
* @param array $tags Array of strings, the cache record will be tagged by each string entry
* @param int $specificLifetime If != false, set a specific lifetime for this cache record (null => infinite lifetime)
* @return boolean True if no problem
*/
public function save($data, $id, $tags = array(), $specificLifetime = false)
{
$lifetime = $this->getLifetime($specificLifetime);
if ($this->_options['compression']) {
$flag = MEMCACHE_COMPRESSED;
} else {
$flag = 0;
}
$result = $this->_memcache->set($id, array($data, time()), $flag, $lifetime);
if (count($tags) > 0) {
$this->saveTags($id, $tags);
}
return $result;
}
/**
* Clean some cache records
*
* Available modes are :
* 'all' (default) => remove all cache entries ($tags is not used)
* 'old' => remove too old cache entries ($tags is not used)
* 'matchingTag' => remove cache entries matching all given tags
* ($tags can be an array of strings or a single string)
* 'notMatchingTag' => remove cache entries not matching one of the given tags
* ($tags can be an array of strings or a single string)
*
* @param string $mode Clean mode
* @param array $tags Array of tags
* @return boolean True if no problem
*/
public function clean($mode = Zend_Cache::CLEANING_MODE_ALL, $tags = array())
{
if ($mode==Zend_Cache::CLEANING_MODE_ALL) {
return $this->_memcache->flush();
}
if ($mode==Zend_Cache::CLEANING_MODE_OLD) {
$this->_log("Zend_Cache_Backend_Memcached::clean() : CLEANING_MODE_OLD is unsupported by the Memcached backend");
}
if ($mode==Zend_Cache::CLEANING_MODE_MATCHING_TAG) {
$siteTags = $newTags = $this->getTags();
if(count($siteTags))
{
foreach($tags as $tag)
{
if(isset($siteTags[$tag]))
{
foreach($siteTags[$tag] as $item)
{
// We call delete directly here because the ID in the cache is already specific for this site
$this->_memcache->delete($item);
}
unset($newTags[$tag]);
}
}
$this->_memcache->set($this->getTagListId(),$newTags);
}
}
if ($mode==Zend_Cache::CLEANING_MODE_NOT_MATCHING_TAG) {
$siteTags = $newTags = $this->getTags();
if(count($siteTags))
{
foreach($siteTags as $siteTag => $items)
{
if(array_search($siteTag,$tags) === false)
{
foreach($items as $item)
{
$this->_memcache->delete($item);
}
unset($newTags[$siteTag]);
}
}
$this->_memcache->set($this->getTagListId(),$newTags);
}
}
}
}
- No control over what keys are invalidated when, due to internal memcache key dropping, can drop a tag key which in turn invalidate a large number of actual valid keys (which would still exist)
- Issues with write concurrency
// Having one slow, and one fast cache mechanism where the slow cache is reliable storage containing a copy of tag versions
$cache_using_file['tag1'] = 'version1';
$cache_using_memcache['key'] = array('data' = 'abc', 'tags' => array('tag1' => 'version1');
- Potential bottleneck using disk/mysql etc. for the slow cache
- Issues with write concurrency
Having come accross the comment here, which explains the logic of evicting existing keys, I believe tags can be implemented reliable by the version flags approach mentioned in: PHP memcache design patterns
I actually implemented this logic once already, but discarded it as unreliable due to memcache eviction of elements before they expire. You can find my initial implementation here. I do however, believe this is a reliable tags pattern because:
Please correct me if I'm wrong! :-)
See Organizing memcache keys
Unless you can "master key" items, there's no sane way to do this. By this I mean something like "user4231-is_valid". You could check that for anything that used that user's data. Otherwise, unless you're tracking everything that references your file in question, you can't invalidate all of them. If you do that, you still have to iterate all possibilities in order to successfully delete.
Document your dependencies, limit your dependencies, track your dependencies in your code for deletion activities.
I have no experience with memcached, but I understand that IO are cheap there.
I'd go with your tag implementation, make sure the tag list is used frequently and hope that the internal mmcd' logic would "think" that it's something too busy to be dropped :)