I have finalized a small PHP application that can serve many documents. These documents must be cacheable by clients and proxies.
Since proxies can cache my results I must be extra careful because the documents I serve can have different MIMEs types (content negotiation based on $_SERVER['HTTP_ACCEPT']) and different languages (based in this order: $_POST value / $_GET value / URL / PHP session value / $_COOKIE value / $_SERVER['HTTP_ACCEPT_LANGUAGE'] / default script value).
To shortly sum up, a page can be served with many MIME type and many languages with the same URL (question changed: see edit below).
To help cache on proxies I use the "Vary: Accept" header in combination with the ETag header. The ETags is a MD5 of the current language and the last modified timestamp.
I always:
- Send an Expires header
- Send a Cache-Control header
- Send a Last-Modified header
- Send a Content-Type header
- Send an ETag header (based on current language and Last-Modified timestamp)
- Send a Content-Language
- Send a "Vary: Accept" header if the document is XHTML
Now with my question: is this enough to help cache on proxies and clients? Did I miss a thing/header?
To help you, here’s the HTTP response header for a test page (on my local environment):
"
Date Wed, 30 Dec 2009 18:56:26 GMT
Server Apache/2.0.63 (Win32) PHP/5.1.0
X-Powered-By PHP/5.1.0
Set-Cookie Tests=697daqbmple2e1daq2dg74ur96; path=/
Expires Wed, 30 Dec 2009 21:56:26 GMT
Cache-Control public, max-age=10800
Last-Modified Mon, 28 Dec 2009 15:11:49 GMT
Etag "44fa50be4638161a596e4b75d6ab7a94"
Vary Accept
Content-Language en-us
Content-Length 3043
Keep-Alive timeout=15, max=100
Connection Keep-Alive
Content-Type application/xhtml+xml; charset=UTF-8
"
EDIT: OK I understand that in this case serving a document with many MIMEs and having different languages (that can come from so many sources - see above) is just plain bad design. If you want to do this just use "private" cache (no cache on proxies)... Am I correct?
If each language have it's own URL (but each URL can be served with many MIME still) is my current implementation is OK for a "public" cache (cache on clients + proxies)?
Since your output also depends on things a proxy cannot know like session data, won't it be easier to send a (non-cachable) redirect to the actual content, which would be fixed for a given URL (with parameters) and therefore much easier to cache. I know this involves an extra round-trip, but it's probably much less error-prone and would also cause less problems with proxies that don't completely understand/support all your header combinations.
Also, I'm guessing that, if you have two clients going through the same proxy but with different language cookies, your current method would return two different ETags for the same URL, which would make the proxy update its copy each time it sees the other client.
I believe you should be fine in principle -- adding the Vary header means that caches should hold multiple instances of your data, keyed by ETag.
I would note, though, that you don't only vary on Accept, you also vary on Cookie and Accept-Language. Varying by cookie means that the proxy will have to validate every request, but should be able to use an If-None-Match header to let the server indicate which (already cached) ETag should be used.
If the response varies both on "Accept" and "Accept-Language", then both need to be mentioned in the "Vary" response header.