Guaranteed file deletion upon program termination

2020-02-08 06:19发布

问题:

Win32's CreateFile has FILE_FLAG_DELETE_ON_CLOSE, but I'm on Linux.

I want to open a temporary file which will always be deleted upon program termination. I could understand that in the case of a program crash it may not be practical to guarantee this, but in any other case I'd like it to work.

I know about RAII. I know about signals. I know about atexit(3). I know I can open the file and delete it immediately and the file will remain accessible until the file descriptor is closed (which even handles a crash). None of these seem like a complete and straightforward solution:

  1. RAII: been there, done that: I have an object whose destructor deletes the file, but the destructor is not called if the program is terminated by a signal.
  2. signals: I'm writing a low-level library which makes registering a signal handler a tricky proposition. For example, what if the application uses signals itself? I don't want to step on any toes. I might consider some clever use of sigaction(2) to cope...but haven't put enough thought into this possibility yet.
  3. atexit(3): apparently useless, since it isn't called during abnormal termination (e.g. via a signal).
  4. preemptive unlink(2): this is pretty good except that I need the file to remain visible in the filesystem (otherwise the system is harder to monitor/troubleshoot).

What would you do here?

Further Explanation

I elided one detail in my original post which I now realize I should have included. The "file" in this case is not strictly a normal file, but rather is a POSIX Message Queue. I create it via mq_open(). It can be closed via mq_close() or close() (the former is an alias for the latter on my system). It can be removed from the system via mq_unlink(). All of this makes it analogous to a regular file, except that I cannot choose the directory in which the file resides. This makes the current most popular answer (placing the file in /tmp) unworkable, because the "file" is created by the system in a virtual filesystem with very limited capacity. (I've mounted the virtual filesystem in /dev/mqueue, following the example in man mq_overview) .

This also explains why I need the name to remain visible (making the immediate-unlink approach unworkable): the "file" must be shared between two or more processes.

回答1:

The requirement that the name remains visible while the process is running makes this hard to achieve. Can you revisit that requirement?

If not, then there probably isn't a perfect solution. I would consider combining a signal handling strategy with what Kamil Kisiel suggests. You could keep track of the signal handlers installed before you install your signal handlers. If the default handler is SIG_IGN, you wouldn't normally install your own handler; if it is SIG_DFL, you would remember that; if it is something else - a user-defined signal handler - you would remember that pointer, and install your own. When your handler was called, you'd do whatever you need to do, and then call the remembered handler, thus chaining the handlers. You would also install an atexit() handler. You would also document that you do this, and the signals for which you do it.

Note that signal handling is an imperfect strategy; SIGKILL cannot be caught, and the atexit() handler won't be called, and the file will be left around.

David Segond's suggestion - a temporary file name daemon - is interesting. For simple processes, it is sufficient; if the process requesting the temporary file forks and expects the child to own the file thereafter (and exits) then the daemon has a problem detecting when the last process using it dies - because it doesn't automatically know the processes that have it open.



回答2:

If you're just making a temporary file, just create it in /tmp or a subdirectory thereof. Then make a best effort to remove it when done through atexit(3) or similar. As long as you use unique names picked through mkstemp(3) or similar even if it fails to be deleted because of a program crash, you don't risk reading it again on subsequent runs or other such conditions.

At that point it's just a system-level problem of keeping /tmp clean. Most distros wipe it on boot or shutdown, or run a regular cronjob to delete old files.



回答3:

Maybe someone suggested this already, but I'm unable to spot it, given all your requirements, the best I can think of is to have the filename somehow communicated to a parent process, such as a start-script, which will clean up after the process dies, had it failed to do so. This is perhaps mostly known as a watchdog, but then with the more common use case added to kill and/or restart the process when it somehow fails.

If your parent process dies as well, you're pretty much out of luck, but most script environments are fairly robust and rarely die unless the script is broken, which is often easier to keep correct than a program.



回答4:

In the past, I have build a "temporary file manager" that kept track of temporary files.

One would request a temporary file name from the manager and this name was registered.

Once you don't need the temporary file name any more, you inform the manager and the filename is unregistered.

Upon receipt of a termination signal, all the registered temporary files were destroyed.

Temporary filenames were UUID based to avoid collisions.



回答5:

You could have the process fork after creating the file, and then wait on the child to close, and then the parent can unlink the file and exit.



回答6:

I just joined stackoverflow and found you here :)

If you're problem is to manage mq files and keep them from piling up, you don't really need to guarantee file deletion upon termination. If you just wanted to useless files from piling up, than keeping a journal may be all you need. Add an entry to the journal file after a mq is opened, another entry when it is closed, and when your library is initialized, check for inconsistency in the journal and take whatever action needed to correct the inconsistency. If you worry about crashing when mq_open/mq_close is being called, you can also add an journal entry just before those functions are called.



回答7:

  • Have a book-keeping directory for temporary files under your dot-directory.
  • When creating a temp-file, first create book-keeping file into the book-keeping directory that contains path or UUID to your to-be temp file.
  • Create that temp file.
  • When temp-file is deleted, then delete the book-keeping file.
  • When the program starts, scan the book-keeping directory for any files containing paths to temporary files and try to delete them if found, them delete book-keeping files.
  • (Log noisily if any step fails.)

I don't see ways to do it any way simpler. This is the boilerplate any production quality program must go through; +500 lines easily.



回答8:

Do you really need the name to remain visible?

Suppose you take the option of immediately unlinking the file. Then:

  • preemptive unlink(2): this is pretty good except that I need the file to remain visible in the filesystem (otherwise the system is harder to monitor/troubleshoot).

    You can still debug on a deleted file, since it will still be visible under /proc/$pid/fd/. As long as you know the pids of your processes, enumerating their open files should be easy.

  • the names need to remain visible during normal operation because they are shared between programs.

    You can still share the deleted open file between processes by passing around the file descriptor over Unix domain sockets. See Portable way to pass file descriptor between different processes for more information.