Actually I have two questions.
(1) Is there any reduction in processing power or bandwidth used on remote server if I retrieve only headers as opposed to full page retrieval using php and curl?
(2) Since I think, and I might be wrong, that answer to first questions is YES, I am trying to get last modified date or If-Modified-Since header of remote file only in order to compare it with time-date of locally stored data, so I can, in case it has been changed, store it locally. However, my script seems unable to fetch that piece of info, I get NULL
, when I run this:
class last_change {
public last_change;
function set_last_change() {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://url/file.xml");
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_FILETIME, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
// $header = curl_exec($curl);
$this -> last_change = curl_getinfo($header);
curl_close($curl);
}
function get_last_change() {
return $this -> last_change['datetime']; // I have tested with Last-Modified & If-Modified-Since to no avail
}
}
In case $header = curl_exec($curl)
is uncomented, header data is displayed, even if I haven't requested it and is as follows:
HTTP/1.1 200 OK
Date: Fri, 04 Sep 2009 12:15:51 GMT
Server: Apache/2.2.8 (Linux/SUSE)
Last-Modified: Thu, 03 Sep 2009 12:46:54 GMT
ETag: "198054-118c-472abc735ab80"
Accept-Ranges: bytes
Content-Length: 4492
Content-Type: text/xml
Based on that, 'Last-Modified' is returned.
So, what am I doing wrong?
Why use CURL for this? There is a PHP-function for that:
returns the following:
Should be easy to get the content-type after this.
You could also add the format=1 to get_headers:
This will return the following:
More reading here (PHP.NET)
You can set the default stream context:
Then use:
get_headers seems to be more efficient than cURL once get_headers skip steps like trigger authentication routines such as log in prompts or cookies.
Here is my implementation using CURLOPT_HEADER, then parsing the output string into a map:
Sample usage:
You need to add
to return the header instead of printing it.
Whether returning only the headers is lighter on the server depends on the script that's running, but usually it will be.
I think you also want "filetime" instead of "datetime".
You are passing $header to
curl_getinfo()
. It should be$curl
(the curl handle). You can get just thefiletime
by passingCURLINFO_FILETIME
as the second parameter tocurl_getinfo()
. (Often thefiletime
is unavailable, in which case it will be reported as -1).Your class seems to be wasteful, though, throwing away a lot of information that could be useful. Here's another way it might be done:
Yes, the load will be lighter on the server, since it's only returning only the HTTP header (responding, after all, to a
HEAD
request). How much lighter will vary greatly.(1) Yes. A HEAD request (as you're issuing in this case) is far lighter on the server because it only returns the HTTP headers, as opposed to the headers and content like a standard GET request.
(2) You need to set the CURLOPT_RETURNTRANSFER option to
true
before you callcurl_exec()
to have the content returned, as opposed to printed:That should also make your class work correctly.