Implementing Model-level caching

2019-03-08 12:09发布

I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient?

I would keep my caching responsibilities firmly within the model. It is none of the controller's or view's business where the model is getting data. All they care about is that when data is requested, data is provided - this is how the MVC paradigm is supposed to work.

(Source: Post by Jarrod)

The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it is worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

Another poster said this:

I would suggest using the md5 hash of your query combined with a serialized version of your input arguments.

Is the minuscule chance of collision worth worrying about?

Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality and due to the nature of caching the developer should have direct control over it and explicity code it into the controller logic.


Update for Bounty

I am indeed using an extremely lightweight ORM somewhat similar to ActiveRecord but is capable of doing complex joins and subqueries without the n^2 problem. I built it myself, so it is flexible and isn't restrictive in terms of relations or column names, and I just want to understand how I should implement the caching mechanism.

Following the advice of the helpful people, I would take a hash (probably md5) of the query concatenated with a list of its parameters, and use this as the key for that particular data store. Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

How do I know when it should be invalidated? Would I have to parse the UPDATE/DELETE/INSERT queries and sub in parameters manually to find out which records are being modified? Or worse, do additional queries whenever data is modified to keep track of which things have changed and what should be invalidated?

I will award the bounty to whoever can give me a clear conceptual explanation (whether or not this is really necessary/efficient to be done transparently), and if so, has some implementation details for the Model caching. I am using PHP and MySQL if that helps to narrow your focus.

8条回答
劫难
2楼-- · 2019-03-08 12:39

What we did, was building a cache layer as a replacement to the loading function of the MVC. This way, only the actual model calls that we want, will be cached. If no caching is necessary or unwanted, the normal way of calling a model from the controller is being used.

If a model is being called through the cachelayer, together with it's eventual parameters, the cache layer will first verify the requested data against the cache pool and return it if still valid. If so, the actual model is not loaded and cached data is just returned to the controller. If not, the model is called as it normally would be.

It's really great to have the possibility of doing this in a layer above the model, since it becomes very easy to introduce the usage of semaphore locks on a per-query / per-model level, to reduce server loads even further.

The biggest advantage to me is though the fact that the models are designed as intended and contains nothing but pure database queries. This way, it is possible to modify a model in production without end users even noticing (assuming that the requested data that a model delivers does not need recreation during the update time, of course.. )

Update: We have also implemented namespacing inside our cachelayer on two levels, a per-model basis and an optional group-basis. Thanks to that, we can easily invalidate all previously invalidate all cached data that comes from a model upon update or deletion in the database.

查看更多
叼着烟拽天下
3楼-- · 2019-03-08 12:39

If you where interested in a more transparent caching system for an active records library. You could assign an id to each query then create an associative array of the result. You can store this relation ship statically or ironically in a database.(It's the kind of trade of of caching you have to use more computer power so you can use less computer power sometimes)

Keeping track of every time the query is run the resulting hash if the result hash is different the new hash is updated. If the hash is the same then it adds to the number of duplicate results. If the desired number of repeat results come up then you cache the results and stop checking the table for an allotted amount of time and or subsequent runs of the query.

You would have a class that controlled all of this going ons. Functions could include things like

-start cache checking
-set threshold
-cache always
-cache time life
-force clear all cache
-clear this cache for this query
-we have been death hit with the death laser and need to catch everything(The I hate you wordpress I'm never using you again function I shouldn't have been so lazy and made my own website function)

This would help to automate much of your process. Also cache rules can be implemented on a model by model basis or to the entire application as a whole.

This might be slightly more overhead then some cache systems but if you just want to have caching doing its own thing I think it would work well; with out it running to much amok.

查看更多
登录 后发表回答