I have a SQL-based application and I like to cache the result using Redis. You can think of the application as an address book with multiple SQL tables. The application performs the following tasks:
40% of the time:
- Create a new record / Update an existing record
- Bulk update multiple records
- Review an existing record
60% of the time:
- Search records based on user's criteria
This is my current approach:
- The system cache a record when a record is created or updated.
- When user performs a search, the system will cache the query result.
On top of that, I have a Redis look-up table (Redis Set) which stores the MySQL record ID and the Redis cache key. That way I can delete the Redis caches if the MySQL record has been changed (e.g., bulk update).
What if a new record is created after the system cache the search result? If the new record matches the search criteria, the system will always return the old cache (which does not include the new record), until the cache is deleted (which won't happen until an existing record in the cache is updated).
The search is driven by the users and the combination of the search condition is countless. It is not possible to evaluate which cache should be deleted when a new record is created.
So far, the only solution is to remove all caches of a MySQL table when a record is created. However this is not a good choice because lots of records are created daily.
In this situation, what's the best way to implement Redis on top of MySQL?
We met the same problem and we chose to do same thing you are thinking of: remove all query caches affected by the table. It is not ideal like your said but fortunately our "write" is not as high as 40% so it's ok so far. That's the nature of query based caching. As an alternative you can add entity based caching. Instead of caching the search result only, cache the entire table and do the search inside memory. We use C# LINQ so we can do pretty common queries in memory but if the search is too complicated then you are out of luck.
Here's a surprising thing when it comes to PHP and MySQL (I am not sure about other languages) - not caching stuff into memcached or Redis is actually faster. Much faster. Basically, if you just built your app and queried MySQL - you'd get more out of it.
Now for the "why" part.
InnoDB
, the default engine, is a superb engine. Specifically, it's memory management (allocation and what not) is superior to any memory storage solutions. That's a fact, you can look it up or take my word for it - it will, at least, perform as good as Redis.Now what happens in your app - you query MySQL and cache the result into redis. However, MySQL is also smart enough to keep cached results. What you just did is create an additional file descriptor that's required to connect to Redis. You also used some storage (RAM) to cache the result that MySQL already cached.
Here comes another interesting part - the preferred way of serving PHP scripts is by using
php-fpm
- it's much quicker than anymod_*
crap out there. Down to the core,php-fpm
is a supervisor process that spawns child processes. They don't shut down after the script is served, which means they cache connections to MySQL - connect once, use multiple times. Basically, if you serve scripts usingphp-fpm
, they will reuse the already established connection to MySQL, meaning that you won't be opening and closing connections for each request - this is extremely resource friendly and it lets you have lightning fast connection to MySQL. MySQL, being memory efficient and having the cached result is much quicker than Redis.Now what does all of this mean for you - having a proper setup lets you have small code that's simple, easy, doesn't involve Redis and eliminates all the problems that you might have with cache invalidation and what not and you won't waste your memory to contain the same data twice.
Ingredients you need for this to work:
php-fpm
MySQL
andInnoDB
based tables and most of all - sufficient RAM and tweakedinnodb_buffer_pool_size
variable. That one controls how much RAM InnoDB is allowed to allocate for its purposes - the larger the better.You eliminated Redis from the game, you kept your code simple and easy to maintain, you didn't duplicate data, you didn't introduce additional system to the play and you let software that's meant to take care of data do its job. Pretty cheap trade-off for maximum usefulness, even if you compile all the software from scratch - it won't take more than an hour or so to get it up and running.
Or, you can just ignore what I wrote and look for a solution using Redis.