We use memcache basically as an after thought to just cache query results.
Invalidation is a nightmare due to the way it was implemented. We since learned some techniques with memcache thru reading the mailing list, for example the trick to allow group invalidation of a bunch of keys. For those who know it, skip the next paragraph..
For those who don't know and are interested, the trick is adding a sequence number to your keys and storing that sequence number in memcache. Then every time before you do your "get" you grab the current sequence number and build your keys around that. Then, to invalidate the whole group you just increment that sequence number.
So anyway, I'm currently revising our model to implement this.
My question is..
We didn't know about this pattern, and I'm sure there are others we don't know about. I've searched and haven't been able to find any design patterns on the web for implementing memcache, best practices, etc.
Can someone point me to something like this or even just write up an example? I would like to make sure we don't make a beginners mistake in our new refactoring.
We also store the query results from our database (PostgreSQL) in memcache and we are using triggers on the tables to invalidate the cache - there are several APIs out there (e.g. pgmemcache, I think mysql has something like that too but I don't know for sure). The benefit is that the database self (triggers) can handle the invalidation of data on changes (update,insert,delete), you don't need to write all that stuff into your "application".
One point to remember with object caching is that it's just that - a cache of objects/complex structures. A lot of people make the mistake of hitting their caches for straightforward, efficient queries, which incurs the overhead of a cache check/miss, when the database would have obtained the result far faster.
This piece of advice is one I've taken to heart since it was taught to me; know when not to cache, that is, when the overhead cancels out the perceived benefits. I know it doesn't answer the specific question here, but I thought it was worth pointing out as a general hint.
I use the Zend Cache component (you don't have to use the entire framework just the zend cache stuff if you want). It abstracts some of the caching stuff (it supports grouping cache by 'tags' though that feature is not supported for the memcache back end I've rolled my own support for 'tags' with relative ease). So the pattern i use for functions that access cache (generally in my model) is:
basing the cache key on a hash of the query means that if another developer bypasses my models or used another function elsewhere that does the same thing it's still pulled from cache. Generally I tag the cache with a couple generate tag (the name of the table is one and the other is the name of the function). So by default our code invalidates on insert,delete and update the cached items with the tag of the table. All in all caching is pretty automatic in our base code and developers can be secure that caching 'just works' in projects that we do. (also the great side effect of making use of tagging is that we have a page that offers granular cache clearing/management, with options to clear cache by model functions, or tables).
mysqlnd_qc, which inserts memcaching at the database query results return level, auto caches result sets from mysql. It is FANTASTIC and automatic.
What rob is saying is good advice. From my experience, there are two common ways to identify and invalidate tags: unique identification and tag-based identification. Those are usually combined to form a complete solution in which:
This is relatively simple to implement and generally works very well. I have yet to come across a system that needed more, though there are probably some edge cases out there that require specific solutions.