We just did a reskin of our website and in aftermath of the deploy, we're having to do a number of small tweaks to the various css and javascript files that control the new look and feel. One of the problems we are encountering is that browsers seem to cache those files and thus a user client may not see some of the fixes we make.
We originally thought of doing something with the mtime of the file and as a 'quick fix' hack, we modified some of the main calls to those files with a url parameter which using server side php to append the latest mtime of the file in question. But this doesn't work as planned because our deploy system clones a git branch and thus, every single deploy ultimately touches every file and alters its mtime. (The end result is, deploying a minor fix ends up telling the client browsers to reload every <LINK>
or <Script>
tag so modified after a deploy whether the file contents were actually changed or not.)
I was looking at the apache FileETag but it seems to use mtime, size and inode. (I was hoping to find something like a checksum) mtime obviously doesn't work with the deploy method we use and size may not change if alterations don't alter the file's length. I'm not as familiar with how the inode info works, so I'm not sure if that will fit our needs or not.
So I am wondering if anyone has another other suggestions that would tell a client browser to only reload a file if it's contents had actually changed.
if you have gulp.js, use gulp-cache-bust or Use gulp-change to chnage version with this code:
gulpfile.js:
html tag:
So there's two important distinctions with caching that people often confuse:
How long the browser should cache resources. This is usually driven of the Cache-Control header (though the older Expires header is sometimes still used too). No request is made to the server during this time until the resource is considered old. This is what you want as much as possible, as it saves issues caused by bandwidth and latency.
What it should do to revalidate expired resources. This is driven off the Last-Modified or ETag header. Here a request is made to the server and the server returns a 304 response saying "I'm not sending you a new version of that file as, based on the last-modified or ETag details you gave me, the copy that you still have is still the latest version so consider that good for another bit". This is really only useful for large resources (e.g. Large images or videos). CSS and JS files are typically smaller and the time to download them is not really that much more than to send a 304 response, so you don't really gain that much from using 304s here. The latency of getting that any response back from the server (be it a 304 or the resource itself) is the main bottleneck and that can only be avoid by keeping the file in the cache longer.
You seem to want to use some validation technique (modified time or ETag) give give you good caching but with instant reloading when you do change something but that is not the best way as you basically can only use the second part of above when it's the first that gives you the benefit. Great article here on why latency is the issue btw: http://www.nateberkopec.com/2015/11/05/page-weight-doesnt-matter.html. Lots of other good info in that article so do recommend reading it all.
So, you ultimately want really long Cache-Controls, so after the first download it stays in the cache. But then you want to force users to reload as quickly as possible when you change things. And those are opposing requirements.
So your options are to either use cache-busting techniques, or to basically ignore the issue.
Cache-busting techniques basically fool the browser into thinking it's asking for a different resource. For example adding parameters to URL. CSS tricks has a great page on various techniques here: https://css-tricks.com/strategies-for-cache-busting-css/. Based on your deployment technique that won't work based on mtime so you really need to add a version number here instead but how easy that is depends on how you create pages and the build process you use.
The other option is to ignore the issue until the resources have expired. This might seem odd, but is increasingly my preferred option so let me explain why. What I do is set up a very short expiry time for key resources that are liable to change (e.g. HTML, CSS and JS). Say an hour or 3. Then accept that people will get the cached, potentially stale resource, for that time. Most recommendations are for a long expiry time in a hope to cache your page for a while but given that caches aren't as big as some people think (again see the article referenced above) that benefit is mostly not seen IMHO except for very frequent visitors. So you've the hassle of long expiry with none of the benefits. Using short expiries has several benefits:
The downsides are:
And ETags are pretty useless in my view due to their poor implementation. Apache uses a combination of file size and modified time by default - which as you rightly point out may not identify whether files contents have changed. Inode is basically the file reference on disk so similar. And is also not recommended when using a load balancer and multiple web servers, as inode will be different in each (which is why it's not used by default anymore). A hash of contents would be better (though potentially slower). But main issue with Etags on Apache is that they don't work when you gzip your data (which you should do)! See my blog post here about this: https://www.tunetheweb.com/performance/http-performance-headers/etag/. You should not use Etags on Apache for this reason. And also they are fairly useless for small resources (like text ones that you will wish to zip) as discussed above. Last modified is almost as good and doesn't have the bug with gzipped resources.
Having proper caching is one of the biggest performance improvements you can make to your site. Without it a site can feel slow and laggy as you browse around it - no matter how fast your server or the visitors computer. With it, a site can feel zippy and responsive even with a slow server. However it does complicate things as you've noticed so requires a good bit of thought as to how to set it up properly.
I go into a bit more detail on how to configure Apache for it here: https://www.tunetheweb.com/performance/http-performance-headers/caching/
Hope that helps - even if it is a bit long winded and unfortunately doesn't give you the quick fix you were no doubt looking for!
If your application has a version number, then you can build this into the filename and use some sort of filter on the server to strip out the version number and serve the file.
Example:
/main_v1.css
/main_v2.css
This would prevent the caching.
As an alternative you could try adding a query string parameter to the requests. I hear this is less likely to work on some browsers though.
Example:
/main.css?v=1
/main.css?v=2