I have a bunch of flows and data processing applications that I occasionally need to spy on, meaning I need to know what files they read. This is mostly to aid in packaging testcases, but can also be useful when debugging.
Is there a way to run the executables in such a way that produces such a list?
I have two thoughts on this:
- There is a command that I can invoke and that command invokes my apps. Something along the lines of GDB. I call GDB, give it a path to the executable and some arguments and GDB calls it for me. Perhaps there's something similar to telling me how system resources are used.
- Maybe the more interesting (but unnecessary side path) solution.
- create library called libc.so which implements fopen (and some others)
- change LD_LIBRARY_PATH to point at the new library
- make a copy of the real libc.so and rename fopen (nepof, perhaps) in an editor
- my library loads the copy and calls the renamed function as necessary to provide fopen functionality.
- call the app which then calls my proxy fopen.
Alternative #1 would certainly be the preferable one but comments on how to do #2 more easily are welcome too.
One option is to use strace:
This will log all file-open events, but it will impose a performance penalty that may be significant. It has the advantage of being easy to use however.
Another option is to use LD_PRELOAD. This corresponds to your option #2. The basic idea is to do something like this:
Then build with:
And run your program with eg:
This has much less overhead.
Note, however, that there are other entry points for opening files - eg, fopen(), openat(), or one of the many legacy compatibility entry points:
You may need to hook all of these for completeness - at the very least, the ones not prefixed with _ should be hooked. In particular, be sure to hook fopen seperately, as the libc-internal call from fopen() to open() is not hooked by a LD_PRELOAD library.
A similar caveat applies to strace - there is the 'openat' syscall as well, and depending on your architecture there may be other legacy syscalls as well. But not as many as with LD_PRELOAD hooks, so if you don't mind the performance hit, it may be an easier option.
example (assume 2343 is the process id):
What I use is something like:
You can then
to get a list of all the files that the program opened.