I was fooling around with ways of caching my website's assets and noticed most websites similar to mine use query strings to override caching (e.g.: /css/style.css?v=124942823)
Afterwards, I noticed that whenever I saved my style.css file, the last-modified headers were "updated", making the query string unnecessary.
So I wonder:
- Why do so many websites use the "query string" method, instead of just letting the last-modified header do its work?
- Should I unset the Last-modified header and just work with query strings? (Is there any particular advantage to this?)
TL;DR
Changing the query string changes the url, ensuring content is "fresh".
No. Though that's almost the right answer.
There are three basic caching strategies used on the web:
To illustrate all three, consider the following scenario:
A user accesses a website for the first time, loads ten pages and leaves. Each page loads the same css file. For each of the above caching strategies how many requests would be made?
No caching: 10 requests
In this scenario, it should be clear that there isn't anything else influencing the result, 10 requests for the css file would result in it being sent to the client (browser) 10 times.
Advantages
Disadvantages
Validation requests: 10 requests
If Last-Modified or Etag are used, there will also be 10 requests. However 9 of them will only be the headers, and no body is transferred. Clients use conditional requests to avoid re-downloading something it already has. Take for example the css file for this site.
The very first time the file is requested, the following happens:
A subsequent request for the same url would look like this:
Note there is no body, and the response is a 304 Not Modified. This is telling the client that the content it already has (in local cache) for that url is still fresh.
That's not to say this is the optimal scenario. Using tools such as the network tab of chrome developer tools allows you to see exactly how long, and doing what, a request takes:
Because the response has no body, the response time will be much less because there's less data to transfer. But there is still a response. and there is still all of the overhead of connecting to the remote server.
Advantages
Disadvantages
Caching forever: 1 request
If there are no etags, no last modified header and only an expires header set far in the future - only the very first access to a url will result in any communication with the remote server. This is a well-known? best practice for better frontend performance. If this is the case, for subsequent requests a client will read the content from it's own cache and not communicate with the remote server at all.
This has clear performance advantages, which are especially significant on mobile devices where latency can be significant (to put it mildly).
Advantages
Disadvantages
Don't use query strings for cache busting
It is to circumvent a client's cache that sites use a query argument. When the content changes (or if a new version of the site is published) the query argument is modified, and therefore a new version of that file will be requested as the url has changed. This is less work/more convenient than renaming the file every time it changes, it is not however without its problems,
Using query strings prevents proxy caching, in the below quote the author is demonstating that a request from browser<->proxy cache server<->website does not use the proxy cache:
This shouldn't be taken lightly - when accessing a website physically located on the other side of the world response times can be very slow. Getting an answer from a proxy server located along the route can mean the difference between a website being usable or not - in the case of cached-forever resources it means the first load of a url is slow, in the case of using validation requests it means the whole site will be sluggish.
Instead version-control assets
The "best" solution is to version control files such that whenever the content changes so does the url. Normally that would be automated as part of the build process.
However a near-compromise to that is to implement a rewrite rule such as
In this way a request for
foo.123.css
is processed by the server asfoo.css
- this has all the advantages of using a query parameter for cache busting, but without the problem of disabling proxy caching.The Last-Modified header is applied differently across browsers, but generally the browser will issue a conditional GET request that the server must respond to if the cache needs to be updated. For example, in Firefox...
By setting a timestamp (or fingerprint), you explicitly let the browser know when it needs to update its cache and you can then set very long expiration times.
It may be worth noting that the documentation on the rails asset pipeline (http://guides.rubyonrails.org/asset_pipeline.html) cites 3 advantages for fingerprinting over a querystring timestamp:
For more detail and best practices on caching: https://developers.google.com/speed/docs/best-practices/caching