I've been looking at glibc/nptl's implementation of cancellation points, and comparing it to POSIX, and unless I'm mistaken it's completely wrong. The basic model used is:
int oldtype = LIBC_ASYNC_CANCEL(); /* switch to asynchronous cancellation mode */
int result = INLINE_SYSCALL(...);
LIBC_CANCEL_RESET(oldtype);
According to POSIX:
The side-effects of acting upon a cancellation request while suspended during a call of a function are the same as the side-effects that may be seen in a single-threaded program when a call to a function is interrupted by a signal and the given function returns [EINTR]. Any such side-effects occur before any cancellation cleanup handlers are called.
My reading of this passage is that if I call open
, I can expect it either to get cancelled (along with my whole thread) before it fails to open a file, or to return a valid file descriptor or -1 and errno
value, but never to create a new file descriptor then lose it into the void. On the other hand, the glibc/nptl implementation of cancellation points seems to allow for a race condition where the cancellation request occurs just after the syscall returns but before LIBC_CANCEL_RESET
takes place.
Am I crazy, or is their implementation really this broken? And if so, does POSIX allow such broken behavior (which seems to render cancellation completely unusable unless you defer it manually), or are they just blatantly ignoring POSIX?
If this behavior is in fact broken, what's the correct way to implement it without such a race condition?