We have a PHP application, and were thinking it might be advantageous to have the application know if there was a change in its makeup since the last execution. Mainly due to managing caches and such, and knowing that our applications are sometimes accessed by people who don't remember to clear the cache on changes. (Changing the people is the obvious answer, but alas, not really achievable)
We've come up with this, which is the fastest we've managed to eke out, running an average 0.08 on a developer machine for a typical project. We've experimented with shasum,md5 and crc32, and this is the fastest. We are basically md5ing the contents of every file, and md5'ing that output. Security isnt a concern, we're just interested in detecting filesystem changes via a differing checksum.
time (find application/ -path '*/.svn' -prune -o -type f -print0 | xargs -0 md5 | md5)
I suppose the question is, can this be optimised any further?
(I realise that pruning svn will have a cost, but find takes the least amount of time out of the components, so it will be pretty minimal. We're testing this on a working copy atm)
Instead of going by file contents, you can use the same technique with filename and timestamps:
This is much faster than reading and hashing the contents of each file.
Insteading of actively searching for changes, why not getting notified when something changes. Have a look at PHP's FAM - File Alteration Monitor API
CAVEAT: requires an additional daemon on the machine and the PECL extension is unmaintained.
We didn't want to use FAM, since we would need to install it on the server, and thats not always possible. Sometimes clients are insistent we deploy on their broken infrastructure. Since it's discontinued, its hard to get that change approved red tape wise also.
The only way to improve the speed of the original version in the question is to make sure your file list is as succinct as possible. IE only hash the directories/files that really matter if changed. Cutting out directories that aren't relevant can give big speed boosts.
Past that, the application was using the function to check if there were changes in order to perform a cache clear if there were. Since we don't really want to halt the application while its doing this, this sort of thing is best farmed out carefully as an asynchronous process using fsockopen. That gives the best 'speed boost' overall, just be careful of race conditions.
Marking this as the 'answer' and upvoting the FAM answer.
since you have svn, why don't you go by revisions. i realise you are skipping svn folders but i suppose you did that for speed advantage and that you do not have modified files in your production servers...
that beeing said, you do not have to re invent the wheel.
you could speed up the process by only looking at metadata read from the directory indexes (modification timestamp, filesize, etc)
you could also stop once you spotted a difference (should theoretically reduce the time by half in average) etc. there is a lot.
i honestly think the best way in this case is to just use the tools already available.
the linux tool
diff
has a-q
option (quick).you will need to use it with the recursive parameter
-r
as well.diff -r -q dir1/ dir2/
it uses a lot of optimisations and i seriously doubt you can significantly improve upon it without considerable effort.
You can be notified of filesystem modifications using the inotify extension.
It can be installed with pecl:
Or manually (download, phpize && ./configure && make && make install as usual).
This is a raw binding over the linux inotify syscalls, and is probably the fastest solution on linux.
See this example of a simple
tail
implementation: http://svn.php.net/viewvc/pecl/inotify/trunk/tail.php?revision=262896&view=markupIf you want a higher level library, or suppot for non-linux systems, take a look at Lurker.
It works on any system, and can use inotity when it's available.
See the example from the README:
Definitely what you should be using is Inotify its fast and easy to configure, multiple options directly from bash or php of dedicate a simple node-inotify instance for this task
But
Inotify
does not worn on windows but you can easy write a command line application with FileSystemWatcher or FindFirstChangeNotification and call viaexec
If you are looking for only
PHP solution
then its pretty difficult and you might not get the performance want because the only way is to scan that folder continuouslyHere is a Simple Experiment
Example
Cache Used
The Experiment Class