What heuristics do browsers use to cache resources

2019-01-13 16:44发布

问题:

13.2.2 Heuristic Expiration

Since origin servers do not always provide explicit expiration times, HTTP caches typically assign heuristic expiration times, employing algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time. The HTTP/1.1 specification does not provide specific algorithms, but does impose worst-case constraints on their results. Since heuristic expiration times might compromise semantic transparency, they ought to used cautiously, and we encourage origin servers to provide explicit expiration times as much as possible. HTTP/1.1 RFC 2616

What are the algorithms used by browsers to estimate plausible expiration times?

The ideal answer will cover all major browsers with evidence from source code or official blog posts.

回答1:

From Chromium's source code: https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_response_headers.cc&l=1082&rcl=1421094684

  if ((response_code_ == 200 || response_code_ == 203 ||
       response_code_ == 206) && !must_revalidate) {
    // TODO(darin): Implement a smarter heuristic.
    Time last_modified_value;
    if (GetLastModifiedValue(&last_modified_value)) {
      // The last-modified value can be a date in the future!
      if (last_modified_value <= date_value) {
        lifetimes.freshness = (date_value - last_modified_value) / 10;
        return lifetimes;
      }
    }
  }


回答2:

This blog post says that Internet Explorer 9 uses max-age = (DownloadTime - LastModified) * 0.1: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx

Which is effectively the same as Mozilla (this post is rather old, I don't know if it has changed since): https://developer.mozilla.org/en-US/docs/HTTP_Caching_FAQ



回答3:

Let's assume all browsers we are interested in are Internet Explorer 8 or newer (e.g. IE5 has some terrible behaviour with caching headers).

There is only ONE standards based way of controlling caching (introduced with HTTP/1.1) - the Cache-Control HTTP header.

Since at least 1996 IE has been using an opt-out policy for caching HTTPS content.

Seemingly since its introduction Chrome has done opt-out for HTTPS (i.e. it will cache it unless told not to). In 2011 Firefox 4 (but not Safari) switched to opt-out caching for HTTPS content. Source.

Recommendations

  1. Only use HTTP headers to control browser caching. If you decide to go against this be aware that IE only recognizes two cache control directives that are set inside HTML:

    <META HTTP-EQUIV="Pragma" CONTENT="no-cache">
    <META HTTP-EQUIV="Expires" CONTENT="-1">
    

    and seemingly only the former is useful in the HTTPS scenario. Further, there can be problems when trying to use Pragma in IE. Finally, Chrome ignores cache directives in meta tags reducing their usefulness even further.

  2. Don't use the Expires header. In modern browsers Expires is superseded by Cache-Control. Expires: 0 and Pragma: no-cache are technically invalid response headers. Yes, they have existed since the beginning but not all modern browsers (e.g. Chrome) use them and they have been superseded by Cache-Control.

  3. The Vary header is a minefield. How Vary behaves in older IEs. How Vary behaves with XHR. Finding the details out is left as an exercise to the reader - and leaves the impression it is preferable to use different URLs for different content...

  4. Allow the browser to make conditional requests by setting ETags. Etags allow a browser to do a lightweight check to see if the content has changed and it can avoid making a full request if it hasn't.

  5. Be aware some browsers are just broken and need hacks. IE 8 can have issues downloading files which it has been told not to cache.

Browser caching algorithms

  • Chrome 49.0.2606.2 HttpResponseHeaders::GetFreshnessLifetimes()
  • Firefox HTTP Caching FAQ, Firefox 38 ESR nsHttpResponseHead::ComputeFreshnessLifetime() .
  • Internet Explorer (6+?), HTTPS caching in IE 8+, Internet Explorer 9+, Internet Explorer 9+.
  • Webkit (Safari) computeFreshnessLifetimeForHTTPFamily()

See also

  • Google's browser caching recommendations.


回答4:

Seems like webkit ("...the OS X system framework version of the engine that's used by Safari...") uses the same heuristics as Chromium.

The following is taken from CacheValidation.cpp:

return (creationTime - lastModifiedValue) * 0.1;


回答5:

Gecko estimates expiration at now + (now - lastModified)/10, last I checked.