I'm struggling with the automated data collection of a PHP script from a webserver. The files in question contain meteo data and are updated every 10 minutes. Weirdly enough, the 'file modified' date on the webserver doesn't change.
A simple fopen('http://...')-command tries to get the freshest version of the last file in this directory every hour. But regularly I end up with a version up to 4 hours old. This happens on a Linux server which (As my system administrator has assured me) doesn't use a proxy server of any kind.
Does PHP implement its own caching mechanism? Or what else could be interfering here?
(My current workaround is to grab the file via exec('wget --nocache...') which works.)
Since you're getting the file via HTTP, I'm assuming that PHP will be honouring any cache headers the server is responding with.
A very simple and dirty way to avoid that is to append some random get parameter to each request.
The Q related to observed caching of content accessed by a fopen('http://...') and the poster wondered whether PHP implement its own caching mechanism? The other answers included some speculation, but surely the easiest way to find out is to check by looking at the source code or perhaps easier instrumenting the system calls to see what is going on? This is trivial to do on Debian systems as follows:
I've included the relevant extract of the strace log below but what this shows is the the PHP RTS simply connects to localhost:80, sends a "GET /xx.txt", gets a response comprising headers and file content which it then echoes to STDOUT.
Absolutely no client-side caching occurs within the PHP RTS, and since this is doing direct HTTP socket dialogue, it is hard to envision where caching could occur on the client. We are left with the possibility of server-side or intermediate proxy caching. (Note I default to an expires of Access + 7 days on txt files).
Logfile Extract
why dont try using curl, I think this is a more proper use for this.
So if I'm understanding you correctly, part of the problem might be that the *.dat file always has a timestamp of 1:00 AM? Do you have control of the server containing the data (
http://www.iac.ethz.ch/php/chn_meteo_roof/
)? If so, you should try to find out why the data always has the same timestamp. I have to believe it is being intentionally set--the OS will update the timestamp when the file is modified unless you go out of your way to make it not do so. If you can't figure out why it is being set to 1AM, you could at least do a "touch" command on the file, which will update it's modified timestamp.This is all, of course, assuming you have some access to the server providing the files.
maybe this can resolve your problem (POST request can't be cached as far i know)