This manpage for the dup2
system call says:
EBUSY (Linux only) This may be returned by dup2() or dup3() during a race condition with open(2) and dup().
What race condition does it talk about and what should I do if dup2
gives EBUSY
error? Should I retry like in the case of EINTR
?
I'm not entirely aware of the choices Linux made, but the comment from the Linux kernel in the other answer points to stuff I've worked on in OpenBSD 13 years ago, so here my attempt at remembering what the hell was going on.
Because of the way
open
is implemented, it first allocates a file descriptor, then it actually tries to finish the opening operation with the file descriptor table unlocked. One reason might be that we don't actually want to cause the side effects of open (the simplest would be changing atime on the file, but for example opening devices can have much more severe side effects) if it fails because we're out of file descriptors. The same applies to all other operations that allocate file descriptors, when you read the text below just substituteopen
with "any system call that allocates file descriptors". I don't remember if this is mandated by POSIX or just The Way Things Have Always Been Done.open
can allocate memory, go down to the file system and do a bunch of things that are potentially blocking for a long time. In the worst case for filesystems like fuse it might even go back up to userland. For that reason (and others) we don't actually want to hold the file descriptor table locked during the whole open operation. Locks inside the kernel are quite bad to hold while sleeping, doubly so if the completion of the locked operation might require interaction with userland[1].The problem happens when someone calls
open
in one thread (or a process that shares the same file descriptor table), it allocates a file descriptor and hasn't finished it yet while at the same time another thread does adup2
pointing to the same file descriptor thatopen
just got. Since an unfinished file descriptor is still invalid (for exampleread
andwrite
will return EBADF when you try to use it) we can't actually close it just yet.In OpenBSD this is solved by keeping track of allocated, but not yet open file descriptors with complex reference counting. Most operations will just pretend like the file descriptor isn't there (but it isn't allocatable either) and will just return
EBADF
. But fordup2
we can't pretend it isn't there, because it is. The end result is that if two threads concurrently callopen
anddup2
, open will actually perform a full open operation on the file, but sincedup2
won the race for the file descriptor, the last thingopen
does is to decrement the reference count on the file it just allocated and close it again. Meanwhiledup2
won the race and pretended to close the file descriptor thatopen
got (which it actually didn't do it was actuallyopen
that did it). It doesn't really matter which behavior the kernel chooses since in both cases this is a race that will lead to unexpected behavior for eitheropen
ordup2
. At best, Linux returning EBUSY is just shrinking the window for a race, but the race is still there, there's nothing preventing thedup2
call to happen just asopen
is returning in the other thread and replace the file descriptor before the caller ofopen
has a chance to use it.The error in your question will most likely happen when you hit this race. To avoid it do not
dup2
to a file descriptor you don't know the state of unless you are sure that there is no one else that will be accessing the file descriptor table at the same time. And the only way to be sure is to be the only thread running (file descriptors are opened behind your back by libraries all the time) or knowing exactly what file descriptor you're overwriting. The reasondup2
over an unallocated file descriptor is allowed in the first place is that it's a common idiom to close fds 0, 1 and 2 and dup2 /dev/null into them.On the other hand, not closing file descriptors before
dup2
will lose the error return fromclose
. I wouldn't worry about that though, since the errors fromclose
are stupid and shouldn't be there in the first place: Handling C Read Only File Close Errors For another example of unexpected behavior of threads and how file descriptors behave strangely because of what I've been talking about here see this question: Socket descriptor not getting released on doing 'close ()' for a multi-threaded UDP clientHere's some example code to trigger this:
A FIFO is the standard method to cause
open
to block for as long as you wish. As expected, this works silently on OpenBSD and on Linuxdup2
returns EBUSY. On MacOS for some reason it kills the shell where I did "echo foo > xxx", while a normal program that just opens it for writing works fine, I have no idea why.[1] An anecdote here. I've been involved in writing a fuse-like filesystem used for an AFS implementation. One bug we had was that we held a file object lock while calling into the userland. The locking protocol for directory entry lookups requires you to hold the directory lock, then look up the directory entry, lock the object under that directory entry and then release the directory lock. Since we held file object lock, some other process came in and tried to look up the file, which led to that process to sleep for the file lock while still holding the directory lock. Another process came in, tried to look up the directory, and ended up holding the lock of the parent directory. Long story short, we ended up with a chain of locks held until we reached the root directory. Meanwhile the filesystem daemon was still talking to the server over the network. For some reason the network operation failed and the filesystem daemon needed to log an error message. To do that it had to read some locale database. And to do that it needed to open a file using the full path. But since the root directory was locked by someone else, the daemon waited for that lock. And we had a deadlock chain 8 locks long. That's why the kernel often performs complex contortionist gymnastics to avoid holding locks during long operations, especially filesystem operations.
There is an explanation in
fs/file.c
,do_dup2()
:Looks like
EBUSY
is returned when the descriptor to be freed is in some kind of incomplete state when it's still being opened (fd_is_open
but not present infdtable
).EDIT (more info and do want bounty)
In order to understand how
!tofree && fd_is_open(fd, fdt)
can happen, let's see how files are opened. Here a simplified version ofsys_open
:Basically two very important things happen: a file descriptor is allocated and only then it is actually opened by the VFS. These two operations modify the
fdt
of the process. They both use a lock, so nothing bad is to expect inside those two calls.In order to memorize which
fds
have been allocated a bit vector calledopen_fds
is used by thefdt
. Afterget_unused_fd_flags()
, thefd
has been allocated and the corresponding bit set inopen_fds
. The lock on thefdt
has been released, but the real VFS job hasn't been done yet.At this precise moment, another thread (or another process in the case of shared
fdt
) can call dup2 which will not block because the locks have been released. If thedup2
took its normal path here, thefd
would be replaced, butfd_install
would be still run for the old file. Hence the check and return ofEbusy
.I found additional info on this race condition in the comments of
fd_install()
which confirms my explanation: