I have set cache headers to be far in future (1 year from now) and have disabled the ETags as advised by the YSlow (http://developer.yahoo.com/performance/rules.html#etags) but Google pagespeed seems to require ETag (or last-modified) even after the cached headers are set.
"It is important to specify one of Expires or Cache-Control max-age, and one of Last-Modified or ETag, for all cacheable resources."
The two rules seems to be conflicting each other.
YSlow does not advise to remove ETags in general but for some environments. When not using ETags then you should use Last-Modified
instead.
ETag
and Last-Modified
are for conditional GET-Requests when re-requesting an already cached and maybe expired resource.
Cache-Control max-age
is for defining how long a cached item is valid for sure without asking again. (When expired by this rule then the browser will make a conditional GET ...)
So in your case:
- Browser is caching the resource for one year. Within that year no request for this resource is done at all. It's directly served from local cache. (uses
Cache-Control
header settings.)
- Browser does conditional Request after one year expired to check if something changed. The server responds with
HTTP 304
and empty body when nothing changed. The browser continues to use its cached item in that case without the need of retransmission. (uses ETag
and/or Last-Modified
header settings)
(The browser may or may not respect your data. For example it is possible that a browser will do a conditional request even when one year has not been expired yet.)
For highly optimized sites the Cache-Control
is far more important, because you set it faaaar future expire headers and simply change the URL for the resource in case it changed. While this prevents the use of conditional Requests it gives you the ability to be extremly aggressive when defining the expires header while being able to serve new versions of the resource immediatly to everybody at the same time. This is because of the new URL it seems to be a new resource in browser's view.
For Java there exists a framework called jawr which makes use of these and other concepts without having negative impact to your site development.
ETag
and Cache-Control
headers are not exclusive. The reason the page you linked to recommends to remove ETags is to reduce the size of the HTTP headers.. which will at best save you a few bytes. Here's a use case where and why is still makes sense to have both:
- You provide
application.js
with one week expiry date, and an etag fingerprint
- Week passes, user comes back to your site: the file has expired, and the browser dispatches a conditional request, if the file has not been modified, the browser can decide to skip requesting the file entirely. (Last-Modified works too)
If you don't provide an ETag
or Last-Modified
, the browser has to request and download the entire file.
Good related resource: https://developers.google.com/speed/articles/caching