I have a small application which monitors a directory tree for specific types file names (*.monitored). It counts the number of matching files, uses inotify to monitor various events for matching files being added or deleted, and can be polled to report the current number of files, and the average rate at which files have been added and removed over the past few seconds. The directory tree can contain hundreds of thousands of files, so I'm trying to avoid maintaining a list of monitored files.
If I run:
touch foo.monitored
I get IN_CREATE, and I set num_files=1
touch foo.ignored
I get IN_CREATE, ignore it, and leave num_files=1
mv foo.ignored foo.monitored
generates:
IN_MOVED_FROM for foo.ignored which I ignore, so num_files=1
IN_MOVED_TO for foo.monitored which I take as a new file, so set num_files=2, however the old foo.monitored has been overwritten, so my total is wrong.
I can't find an event signalling the demise of the old foo.monitored - is there a way to do what I want without maintaining a huge structure of filenames?
Thanks!
No, inotify will not help you here. It does not emit a delete event in that case.
Perhaps a compromise solution would be to record how many monitored files there are in each directory, and then simply rescan that one directory each time you get an ambiguous signal?
Using inotify to monitor directory trees has bigger problems, however. Have you considered what happens if a directory with thousands of monitored files is moved into or out of your tree? Even moving directories within the tree is problematic.
Edit: other ideas:
Add an inotify watch on each file, individually. This is probably not a good plan.
A counter can only ever be accurate at the point you read it; any caller that reads the count and then expects that to match what is read after has a nasty race condition bug waiting to happen. Therefore, it probably OK to just accept that the counter may be a little bit wrong, and correct it as and when you get the opportunity.
Do a full scan after every 5 move events.
After a move event, wait 30 seconds to see if there are any more, and only then do the scan.
Split the tree into sections ("buckets"), and record a count for each. This should reduce the scan overhead.
Record a hash for each monitored file path. This might be less memory/trouble than recording the actual file name.