Update, 4/10 2012: Fixed by libc patch
I have a problem canceling threads in pthread_cond_wait
, that use mutexes with the PTHREAD_PRIO_INHERIT
attribute set. This only happens on certain platforms though.
The following minimal example demonstrates this: (compile with g++ <filename>.cpp -lpthread
)
#include <pthread.h>
#include <iostream>
pthread_mutex_t mutex;
pthread_cond_t cond;
void clean(void *arg) {
std::cout << "clean: Unlocking mutex..." << std::endl;
pthread_mutex_unlock((pthread_mutex_t*)arg);
std::cout << "clean: Mutex unlocked..." << std::endl;
}
void *threadFunc(void *arg) {
int ret = 0;
pthread_mutexattr_t mutexAttr;
ret = pthread_mutexattr_init(&mutexAttr); std::cout << "ret = " << ret << std::endl;
//Comment out the following line, and everything works
ret = pthread_mutexattr_setprotocol(&mutexAttr, PTHREAD_PRIO_INHERIT); std::cout << "ret = " << ret << std::endl;
ret = pthread_mutex_init(&mutex, &mutexAttr); std::cout << "ret = " << ret << std::endl;
ret = pthread_cond_init(&cond, 0); std::cout << "ret = " << ret << std::endl;
std::cout << "threadFunc: Init done, entering wait..." << std::endl;
pthread_cleanup_push(clean, (void *) &mutex);
ret = pthread_mutex_lock(&mutex); std::cout << "ret = " << ret << std::endl;
while(1) {
ret = pthread_cond_wait(&cond, &mutex); std::cout << "ret = " << ret << std::endl;
}
pthread_cleanup_pop(1);
return 0;
}
int main() {
pthread_t thread;
int ret = 0;
ret = pthread_create(&thread, 0, threadFunc, 0); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Thread created, waiting a bit..." << std::endl;
sleep(2);
std::cout << "main: Cancelling threadFunc..." << std::endl;
ret = pthread_cancel(thread); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Joining threadFunc..." << std::endl;
ret = pthread_join(thread, NULL); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Joined threadFunc, done!" << std::endl;
return 0;
}
Every time I run it, main()
hangs on pthread_join()
. A gdb backtrace shows the following:
Thread 2 (Thread 0xb7d15b70 (LWP 257)):
#0 0xb7fde430 in __kernel_vsyscall ()
#1 0xb7fcf362 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xb7fcc9f9 in __condvar_w_cleanup () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:434
#3 0x08048fbe in threadFunc (arg=0x0) at /home/pthread_cond_wait.cpp:22
#4 0xb7fc8ca0 in start_thread (arg=0xb7d15b70) at pthread_create.c:301
#5 0xb7de73ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
Thread 1 (Thread 0xb7d166d0 (LWP 254)):
#0 0xb7fde430 in __kernel_vsyscall ()
#1 0xb7fc9d64 in pthread_join (threadid=3083950960, thread_return=0x0) at pthread_join.c:89
#2 0x0804914a in main () at /home/pthread_cond_wait.cpp:41
If PTHREAD_PRIO_INHERIT
isn't set on the mutex, everything works as it should, and the program exits cleanly.
Platforms with problems:
- Embedded AMD Fusion board, running a PTXDist based 32-bit Linux 3.2.9-rt16 (with RTpatch 16). We are using the newest OSELAS i686 cross toolchain (2011.11.1), using gcc 4.6.2, glibc 2.14.1, binutils 2.21.1a, kernel 2.6.39.
- Same board with the 2011.03.1 toolchain also (gcc 4.5.2 / glibc 2.13 / binutils 2.18 / kernel 2.6.36).
Platforms with no problems:
- Our own ARM-board, also running a PTXDist Linux (32-bit 2.6.29.6-rt23), using OSELAS arm-v4t cross toolchain (1.99.3) with gcc 4.3.2 / glibc 2.8 / binutils 2.18 / kernel 2.6.27.
- My laptop (Intel Core i7), running 64-bit Ubuntu 11.04 (virtualized / kernel 2.6.38.15-generic), gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils 2.21.0.20110327.
I have been looking around the net for solutions, and have come across a few patches that I've tried, but without any effect:
- Making the condition variables priority inheritance aware.
- Handling EAGAIN from FUTEX_WAIT_REQUEUE_PI
Are we doing something wrong in our code, which just happens to work on certain platforms, or is this a bug in the underlying systems? If anyone has any idea about where to look, or knows of any patches or similar to try out, I'd be happy to hear about it.
Thanks!
Updates:
- libc-help mailing list discussion
- glibc bug report