Suppose that a process is creating a mutex in shared memory and locking it and dumps core while the mutex is locked.
Now in another process how do I detect that mutex is already locked but not owned by any process?
Suppose that a process is creating a mutex in shared memory and locking it and dumps core while the mutex is locked.
Now in another process how do I detect that mutex is already locked but not owned by any process?
You should use a semaphore as provided by the operating system.
The operating system releases all resources that a process has open whether it dies or exits gracefully.
How about file-based locking (using
flock(2)
)? These are automatically released when the process holding it dies.Demo program:
Output (I've truncated the PIDs and times a bit for clarity):
What happens is that the first program acquires the lock and starts to sleep for 5 seconds. After 2 seconds, a second instance of the program is started which blocks while trying to acquire the lock. 3 seconds later, the first program segfaults (bash doesn't tell you this until later though) and immediately, the second program gets the lock and continues.
If you're working in Linux or something similar, consider using named semaphores instead of (what I assume are) pthreads mutexes. I don't think there is a way to determine the locking PID of a pthreads mutex, short of building your own registration table and also putting it in shared memory.
It seems that the exact answer has been provided in the form of robust mutexes.
According to POSIX, pthread mutexes can be initialised "robust" using pthread_mutexattr_setrobust(). If a process holding the mutex then dies, the next thread to acquire it will receive EOWNERDEAD (but still acquire the mutex successfully) so that it knows to perform any cleanup. It then needs to notify that the acquired mutex is again consistent using pthread_mutex_consistent().
Obviously you need both kernel and libc support for this to work. On Linux the kernel support behind this is called "robust futexes", and I've found references to userspace updates being applied to glibc HEAD.
In practice, support for this doesn't seem to have filtered down yet, in the Linux world at least. If these functions aren't available, you might find pthread_mutexattr_setrobust_np() there instead, which as far as I can gather appears to be a non-POSIX predecessor providing the same semantics. I've found references to pthread_mutexattr_setrobust_np() both in Solaris documentation and in /usr/include/pthread.h on Debian.
The POSIX spec can be found here: http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_setrobust.html
I left this WRONG post undeleted only if someone will have the same idea and will find this discussion of use!
You can use this approach. 1) Lock the POSIX shared mutex 2) Save the process-id in the shared memory. 3) Unlock the shared mutex 4) On correct exit clean the process-id
If the process coredumps the next process will find that in the shared memory there is a process-id saved on step #2. If there is no process with this process-id in the OS then no one owns the shared mutex. So it's just necessary to replace the process-id.
Update in order to answer the comment:
Scenario 1: 1. P1 starts 2. P1 creates/opens a named mutex if it doesn't exists 3. P1 timed_locks the named mutex and successfuly does it (waits for 10 secs if necessary); 4. P1 coredumps 5. P2 starts after the coredump 6. P2 creates/opens a named mutex, it exists, it's OK 7. P2 timed_locks the named mutex and fails to lock (waits for 10 secs if necessary); 8. P2 remove the named mutex 9. P2 recreates a named mutex & lock it