I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient?
I would keep my caching responsibilities firmly within the model. It is none of the controller's or view's business where the model is getting data. All they care about is that when data is requested, data is provided - this is how the MVC paradigm is supposed to work.
(Source: Post by Jarrod)
The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it is worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?
How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?
Another poster said this:
I would suggest using the md5 hash of your query combined with a serialized version of your input arguments.
Is the minuscule chance of collision worth worrying about?
Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality and due to the nature of caching the developer should have direct control over it and explicity code it into the controller logic.
Update for Bounty
I am indeed using an extremely lightweight ORM somewhat similar to ActiveRecord but is capable of doing complex joins and subqueries without the n^2
problem. I built it myself, so it is flexible and isn't restrictive in terms of relations or column names, and I just want to understand how I should implement the caching mechanism.
Following the advice of the helpful people, I would take a hash (probably md5) of the query concatenated with a list of its parameters, and use this as the key for that particular data store. Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?
How do I know when it should be invalidated? Would I have to parse the UPDATE/DELETE/INSERT queries and sub in parameters manually to find out which records are being modified? Or worse, do additional queries whenever data is modified to keep track of which things have changed and what should be invalidated?
I will award the bounty to whoever can give me a clear conceptual explanation (whether or not this is really necessary/efficient to be done transparently), and if so, has some implementation details for the Model caching. I am using PHP and MySQL if that helps to narrow your focus.
There are quite a few factors to consider with caching, such as hashing, invalidation, etc. But the goal of caching is always the same: to reduce response times and resource consumption.
Here are a couple of quick thoughts off the top of my head for systems that do not use ORM:
SELECT
queries since other types affect dataserialize()
'd version of the parameters (this identifies unique queries. Seralizing parameters is not an issue because the size of parameters generally passed to select queries is usually quite trivial). Serializing isn't as expensive as you think. And because you hashed your static query concatenated with your dynamic params, you should never have to worry about collisions.INSERT
/UPDATE
/DELETE
) to rows in a model should invalidate (or set a TTL) on all items cached for that modelORDER BY
/LIMIT
type operations in a new (modified) query rather than to pull an entire rowset from cache and manipulate it through PHP to achieve the same thing (unless there is very high latency between your web and database servers).Attempting to manage cache validation for an ORM system is a completely different beast (due to relations), and should probably be handled on a case-by-case basis (in the controller). But if you're truly concerned with performance, chances are you wouldn't be using an ORM to begin with.
UPDATE:
If you find yourself using multiple instances of the same model class within a single thread, I would suggest also potentially memcaching your instantiated model (depending on your constructor, deserializing and waking an object is sometimes more efficient than constructing an object). Once you have an intialized object (whether constructed or deserialized), it is worlds more efficient to
clone()
a basic instance of an object and set its new state rather than to reconstruct an object in PHP.Who else is better suited to track any of that? Multiple controllers will be using the same model to fetch the data they need. So how in the world would a controller be able to make a rational decision?
There are no hard and fast rules -- a smart caching strategy is almost completely driven by context. The business logic (again, models!) is going to dictate what sorts of things ought to be in the cache, when the cache needs to be invalidated, etc.
You're absolutely right that caching search results seems like a bad idea. I'm sure it usually is. It's possible that if your search results are very expensive to generate, and you're doing something like pagination, you might want a per-user cache that holds the most recent results, along with the search parameters. But I think that's a fairly special case.
It's difficult to give more specific advice without the context, but here are a couple of scenarios:
1) You have business objects that can have a category assigned. The categories rarely change. Your Category model ought to cache the full set of categories for read operations. When the infrequent right operations occur, they can invalidate the cache. Every view script in the system can now query the model and get the current categories back (for rendering select boxes, let's say) without concerning itself with the cache. Any controller in the system can now add/update/delete categories without knowing about the cache.
2) You have some complex formula that consumes multiple inputs and creates a popularity rating for some kind of "products". Some widget in your page layout shows the 5 most popular objects in summary form. Your Product model would provide a getPopular() method, which would rely on the cache. The model could invalidate the cache every X minutes, or some background process could run at regular intervals to invalidate/rebuild. No matter what part of the system wants the popular products, they request it via the model, which transparently manages the cache.
The exact caching implementation is highly dependent on the sort of data you're manipulating, combined with the typical use cases.
The caveat here is that if you're abusing ActiveRecord, and/or composing SQL queries (or equivalents) in your controllers, you're probably going to have issues. Doing smart caching is a lot easier if you've got a nice, rich, model layer that accurately models your domain, instead of flimsy models that just wrap database tables.
It's not about the Models being smart, it's about the developer being smart.
You should have a seperate Model which does the SQL interfacing directly, eg. for a Customers table:
$CustomerModel->GetCustomers($parameter);
et cetera. Then, in those models, you can implement caching transparently without having to edit any of your existing MVCs.I would recommend that you look here for a comprehensive look at caching in ORM's including the problems and solutions that can be applied.
When dealing with caching data in an ORM, you generally have the following 3 problems to solve:
Your post only makes any sense if the model is a trivial ORM. And there are lots of reasons why that's a bad thing. Try thinking about the model as if it were a web service.
Caching is the responsiblity of the model.
But the inputs to the model uniquely define its output.
If you're using the same model to retrieve the contents of a shopping basket and to run a search on your product catalog then there's something wrong with your code.
Even in the case of the shopping basket, there may be merit in caching data with a TTL of less than the time taken to process a transaction which would change its contents, in the case of the catalog search, caching the list of matching products for a few hours will probably have no measurable impact on sales, but trade-off well in reducing database load.
The fact that you are using a trivial ORM out of the box does not exclude you from wrapping it in your own code.
No. You make the determination on whether to cache, and if you can't ensure that the cache is consistent then enforce a TTL based on the type of request.
As a general rule of thumb, you should be able to predict appropriate TTLs based on the SELECT query before binding any variables and this needs to be implemented at design time - but obviously the results should be indexed based on the query after binding.
For preference I would implement this as a decorator on the model class - that way you can easily port it to models which implement a factory rather than trivial ORM.
C.
This isn't really an answer, but your question reminded me I had seen this chapter which describes, I think, how to do what you want to do using the Doctrine ORM with Symfony. You might want to compare with that approach/implementation.
Basically, the approach there doesn't try for "astronomically smart" but allows the programmer to manually specify result sets to cache based on the volatility of the data and its performance impact... I suppose you could approximate that decision and recalculate it nightly based on actual metrics or something.