Is the HttpWebResponse.LastModified accurate? Is it always present? My project is to create a sort of a focused web crawler and I am stucked if I will use the hash value of a resource or just the HttpWebResponse.LastModified property to check the resource's "freshness".
Using the hash value means streaming the resource every time it's checked. This has a big impact on overall performance.
If I will just check the HttpWebResponse.LastModified, is it accurate?
HttpWebResponse.LastModified returns the value of the HTTP Last-Modified
response header.
HTTP response headers are set by the HTTP server sending the response. It's completely up to the server if it sets the Last-Modified
response header, and whether it sets it to an accurate value or not.
The Last-Modified
response header is part of the Validation Model for Caching in HTTP. It is usually used in conjunction with the If-Modified-Since
request header. You might want to read HTTP/1.1, part 6: Caching for the details.
It depends on your purpose.
Last-Modified will mean that the server is happy for you to keep using an entity that had the same last-modified value (or later by implication, though it would be strange for the server's last-modified to ever go back, but could happen in some multi-server situations).
E-tag is stronger (all the more if it's not a "weak" e-tag) in that it identifies the specific entity (e-tags for different language versions, different content-type versions, or different content-encoding versions will differ unless they are actually the same entity [which can happen, in a restricted set of circumstances]).
Both can be "loose" in terms of perhaps a server change is considered insignificant; the server is happy for you to keep using the previous entity because it considers it the same (except "strong" e-tags, which must indicate octet-to-octet identity for use with range requests).
Both can of course just be plain wrong. Bugs happen. That said, when they are wrong its more often in the other direction, reporting a change when none has happened (a valid behaviour, one is allowed to be over-cautious about freshness; it never damages only makes sub-optimal).
The question then, is whether you need to know that the server considers no change to have been made (most usage) or there really has been a change (pretty much restricted to diagnostic tools).
Unless you've a clear reason not to, trust last-modified and e-tag (but trust e-tag more).