I have a php web application where certain data changes on a weekly basis but is read very frequently often.
The SQL queries that retrieve the data and the php code for html output are fairly complex. There are multiple table joins, and numerous calculations - but they result in a fairly basic html table. Users are grouped, and the table is the same for each group each week, but different for different groups. I could potentially have hundreds of tables for thousands of users.
For performance reasons, I'd like to cache this data. Rather than running these queries and calculations every time someone hits the page, I want to run a weekly process to generate the table for each group giving me a simple read when required.
I'd be interested to know what techniques you've used successfully or unsuccessfully to achieve something like this?
Options I can see include:
- Storing the html result of the calculations in a MySQL table, identified by user group
- Storing the resultant data in a MySQL table, identified by user group (difficult as there's no fixed number of data items)
- Caching the page output in static files
Any other suggestions would be welcome!
In the function to generate the table, make it store the result to a file on disk:
You don't necessarily have to run a weekly batch job for it, when calling the function to get the data, check if the cache is out of date (or non-existent). If so, generate and cache the results then. If not, just return the cached file.
The
cacheIsStale()
function could just look at the file's timestamps to test for freshness.It seems you already have most of it covered.
One other option, assuming the table data is not huge, is to use memcache to cache the results - this would probably be the faster solution, although you would need to check memory requirements to see if it's a viable option.
There are indeed a few options:
first of all, profile. verify that those queries are really consuming a significant amount of time. maybe MySQL query result caches has already done the work for you.
if they are really consuming resources, what i would do is to create a table with the computed results, and a procedure that do all needed managing, to be called when the data changes. those frequent reads should go only to the pre-computed data, without bothering to check if it's still valid.
simply add some hooks to the procedures that modify the base data, or database triggers if you can, these would be executed unfrequently (weekly?), and could take a lot of time to generate any results.