I'm loading XML files from disk using file_get_contents, and as a test I find I can load a 156K file using file_get_contents()
1,000 times in 3.99 seconds. I've subclassed the part that does the loading and replaced it with a memcache layer, and on my dev machine find I can do 1000 loads of the same document in 4.54 seconds.
I appreciate that file_get_contents() will do some caching, but it looks like it is actually faster than a well-known caching technique. On a single server, is the performance of file_get_contents()
as good as one can get?
I'm on PHP 5.2.17 via Macports, OS X 10.6.8.
Edit: I've found on XML documents of this size, there is a small benefit to be had in using the MEMCACHE_COMPRESSED
flag. 1,500 loads via memcache are done in 6.44 sec (with compression) rather than 6.74 (without). However both are slower than file_get_contents
, which does the same number of loads in 5.71 sec.
Because file_get_contents mmap
s the file and so you'll only have a few file system calls and this will end up in the file system cache. memcache involves out-of-process calls to the memcached (and out of server on a clustered implementation).
The performance of file_get_contents()
crucially depends on the type of file system, for example a file on an NFS mounted file system is not mmapped and this access can be a LOT slower. Also on a multi-user server, the file system cache can get rapidly flushed by other processes whereas the memcached cache will almost certainly be in memory.
file_get_contents is the simplest way to retrieve a file. The underlying operating system (especially linux) already has efficient caching mechanisms. Anything else you do just creates overhead and slows things down.
Memcache would make sense if you loaded these files from a remote location.
Edit: It is not necessarily true that file_get_contents is the simplest way. fopen/fget might be even faster - I don't know. But the differences should be minor compared to the complexity of a caching layer.
Storing XML files in memcache makes very little sense to me.
I'd rather store parsed values, saving me both reading and parsing.