Note: Please read to the end before marking this as duplicate. While it's similar, the scope of what I'm looking for in an answer extends beyond what the previous question was asking for.
Widespread practice, which I tend to agree with, tends to be treating close
purely as a resource-deallocation function for file descriptors rather than a potential IO operation with meaningful failure cases. And indeed, prior to the resolution of issue 529, POSIX left the state of the file descriptor (i.e. whether it was still allocated or not) unspecified after errors, making it impossible to respond portably to errors in any meaningful way.
However, a lot of GNU software goes to great lengths to check for errors from close
, and the Linux man page for close
calls failure to do so "a common but nevertheless serious programming error". NFS and quotas are cited as circumstances under which close
might produce an error but does not give details.
What are the situations under which close
might fail, on real-world systems, and are they relevant today? I'm particularly interested in knowing whether there are any modern systems where close
fails for any non-NFS, non-device-node-specific reasons, and as for NFS or device-related failures, under what conditions (e.g. configurations) they might be seen.
Once upon a time (24 march, 2007), Eric Sosman had the following tale to share in the comp.lang.c newsgroup:
(Let me begin by confessing to a little white lie: It wasn't
fclose() whose failure went undetected, but the POSIX close()
function; this part of the application used POSIX I/O. The lie
is harmless, though, because the C I/O facilities would have
failed in exactly the same way, and an undetected failure would
have had the same consequences. I'll describe what happened in
terms of C's I/O to avoid dwelling on POSIX too much.)
The situation was very much as Richard Tobin described.
The application was a document management system that loaded a
document file into memory, applied the user's edits to the in-
memory copy, and then wrote everything to a new file when told
to save the edits. It also maintained a one-level "old version"
backup for safety's sake: the Save operation wrote to a temp
file, and then if that was successful it deleted the old backup,
renamed the old document file to the backup name, and renamed the
temp file to the document. bak -> trash, doc -> bak, tmp -> doc.
The write-to-temp-file step checked almost everything. The
fopen(), obviously, but also all the fwrite()s and even a final
fflush() were checked for error indications -- but the fclose()
was not. And on one system it happened that the last few disk
blocks weren't actually allocated until fclose() -- the I/O
system sat atop VMS' lower-level file access machinery, and a
little bit of asynchrony was inherent in the arrangement.
The customer's system had disk quotas enabled, and the
victim was right up close to his limit. He opened a document,
edited for a while, saved his work thus far, and exceeded his
quota -- which went undetected because the error didn't appear
until the unchecked fclose(). Thinking that the save succeeded,
the application discarded the old backup, renamed the original
document to become the backup, and renamed the truncated temp
file to be the new document. The user worked a little longer
and saved again -- same thing, except you'll note that this time
the only surviving complete file got deleted, and both the
backup and the master document file are truncated. Result: the
whole document file became trash, not just the latest session
of work but everything that had gone before.
As Murphy would have it, the victim was the boss of the
department that had purchased several hundred licenses for our
software, and I got the privilege of flying to St. Louis to be
thrown to the lions.
[...]
In this case, the failure of fclose() would (if detected) have
stopped the delete-and-rename sequence. The user would have been
told "Hey, there was a problem saving the document; do something
about it and try again. Meanwhile, nothing has changed on disk."
Even if he'd been unable to save his latest batch of work, he would
at least not have lost everything that went before.
Consider the inverse of your question: "Under what situations can we guarantee that close
will succeed?" The answer is:
- when you call it correctly, and
- when you know that the file system the file is on does not return errors from
close
in this OS and Kernel version
If you are convinced that you program doesn't have any logic errors and you have complete control over the Kernel and file system, then you don't need to check the return value of close
.
Otherwise, you have to ask yourself how much you care about diagnosing problems with close
. I think there is value in checking and logging the error for diagnostic purposes:
- If a coder makes a logic error and passes an invalid fd to
close
, then you'll be able to quickly track it down. This may help to catch a bug early before it causes problems.
- If a user runs the program in an environment where
close
does return an error when (for example) data was not flushed, then you'll be able to quickly diagnose why the data got corrupted. It's an easy red flag because you know the error should not occur.