How to correctly destroy pthread mutex

2019-03-30 14:34发布

问题:

How exactly i can destroy a pthread mutex variable ?

Here is what i want to do. I want to have objects (structure variables) cached , which are looked up by key. I want to have minimum granularity of locks here. So i want to have a lock for each object probably embedded in the structure so that i can have object level locking.

Now the problem is how to safely destroy these objects ? Looks like first step is to remove the object from the lookup table so that the object is not accessible in future that is fine.

I want to free the object from the cache. Now how to destroy/free mutex correctly ? pthread_mutex_destroy document says we should not use the pthread_mutex_destroy while the mutex is locked. Lets say a thread decides to destroy the object it needs to destroy the lock so it releases the lock and does a pthread_mutex_destroy. What happens to the other threads waiting for the objects lock ?

Here is the code to simulate the above , note i used sleep(2) to magnify the effect of race .


#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

typedef struct exampleObj {
   pthread_mutex_t mutex;
   int key;
   int value1;
   int value2;
}exampleObj;

exampleObj sharedObj = {PTHREAD_MUTEX_INITIALIZER,0,0,0};

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; 

exampleObj* Lookup(int key) {
   return &sharedObj;
}

void* thrFunc(void* id) {
   int i = (*((int*)id));
   char errBuf[1024];
   exampleObj * obj = Lookup(0);

   if (pthread_mutex_lock(&obj->mutex)) {
      printf("Locking failed %d \n",i);
      return NULL;
   }
   // Do something
   printf("My id %d will do some work for 2 seconds.\n",i);
   sleep(2);
   pthread_mutex_unlock(&obj->mutex);
   int errNum = pthread_mutex_destroy(&obj->mutex);
   strerror_r(errNum,errBuf,1024);
   printf("Destroying mutex from thread %d : %s\n ",errNum,errBuf);
   return NULL;
}

int main() {
   pthread_t thrds[10];
   int i;
   int args[10];
   char errBuf[1024];
   int errNum = 1;

   for (i=0;i<10;i++){
      args[i] = i;
      pthread_create(&thrds[i],NULL,thrFunc,args+i);
   }

   for (i=0;i<10;i++){
      pthread_join(thrds[i],NULL);
   }
   return 0;
}

Multiple threads succeeds in destroying the mutex. And the remaining threads hang for ever. Gdb shows those threads are waiting for the lock.

回答1:

The basic problem you have is that removing an object from the cache is something that requires synchronisation at the cache level, not the object level.

One way to implement this is by having a global lock for the entire cache that is only held during lookups, and is dropped once the object lock has been acquired. This lock can be a reader-writer lock, held for writing only if a thread is going to remove the object. So a thread that wishes to use a cache object would do:

pthread_rwlock_rdlock(&cache_lock);
exampleObj * obj = Lookup(key);
pthread_mutex_lock(&obj->mutex);
pthread_rwlock_unlock(&cache_lock);

/* Do some work on obj */

pthread_mutex_unlock(&obj->mutex);

and a thread that wishes to destroy a cache object would do:

pthread_rwlock_wrlock(&cache_lock);
exampleObj * obj = Lookup(key);
pthread_mutex_lock(&obj->mutex);
Remove(key);
pthread_rwlock_unlock(&cache_lock);

/* Do some cleanup work on obj */
pthread_mutex_unlock(&obj->mutex);
pthread_mutex_destroy(&obj->mutex);

(where the Remove() function removes the function from the cache so that subsequent Lookup() functions cannot return it).



回答2:

It's undefined behavior to (a) attempt to destroy a locked mutex, or (b) reference a destroyed mutex other than to call pthread_mutex_init to recreate it (See documentation). That means that the thread that destroys your shared mutex is going to race with the others locking it, and either (1) destroy happens first, other threads invoke undefined behavior trying to lock because of (b) or (2) lock in another thread happens first and destroying thread invokes undefined behavior because of (a).

You need to change your design so that a mutex under active contention is never destroyed. For your example, you could destroy the shared mutex in main after all threads are joined. For the program you describe, you probably need to insert a reference count in the objects.



回答3:

I want to free the object from the cache. Now how to destroy/free mutex correctly ? pthread_mutex_destroy document says we should not use the pthread_mutex_destroy while the mutex is locked. Lets say a thread decides to destroy the object it needs to destroy the lock so it releases the lock and does a pthread_mutex_destroy. What happens to the other threads waiting for the objects lock ?

Well I hope I get your intention right, I had the exact same problem. Anyway I realized, later that I was stupid: Complaining about undefined behaviour of pthread_mutex_* functions after pthread_mutex_destroy() is like complaining about SEGFAULTS when accessing a pointer after free().

Most C programs are modelled around the paradigm that every program must make sure that memory is not accessed after some sort of destruction. Good C programs will have a design that prevents pointers from being spread everywhere, so that destruction happens only at well defined places, when no other variable contains a pointer anymore. This is not at all a concern in garbage collected languages.

Solution 1: Use refcounting like it is done for memory allocation. The refcounter is accessed via atomic functions. (Use the glib, it contains great, portable stuff)

Solution 1b: Use refcounting like it is done for memory allocation, sperate the kinds of workers that are important from those that aren't and use weak references in the later so that they do not prevent object destruction.

Solution 2: Do not destroy the mutex. Why bother with saving RAM? Just make a global static array of like 128k objects. Add a struct member wich indicates the state of the object. Instead of destruction just atomic compare and set the state variable, and print an error in the threads that access an object in "DISABLED" state.

Solution 3 - The hard way: Don't do shared memory concurrency. Combine a thread pool which matches the number of CPUs on the system, use non-blocking IO, message objects and state-machine design. Make message queues for each task, and let tasks communicate only by messages enqueued in the queue of the other. Put the queue in the same 'select' or 'pollfd' set that contains the sockets/filedescriptors. To shuffle big data (3d game) between state machines, use a struct with an atomic refcounter and copy on write semantics. This will in most cases be the most performant, stable and maintainable solution.

If what you do has anything to do with performance, think twice about using the atomic operations. They can be more expensive than mutexes.



回答4:

I couldn't agree more with caf on this. We have done something similar in certain implementation (e.g. refer ifData_createReference & ifData_removeReference routines in ifMIB.c). The basic idea is keeping a global lock to guard the entire object list and an object level lock for guarding individual entry in the list.

When we have to create a new entry in the list, take WRITE lock on the list and add a new entry, so that entry is added consistently to all the users of the list. And release the list lock.

When we have to look-up/access an entry from the list, take a READ lock on the list and search for the entry. Once we find the entry, take object lock in READ mode for read-only operations / take object lock in WRITE mode for modifying the object entry. Now, release the list lock. Now once we are done with processing of the object entry release the object lock as well.

When the object entry has to be removed from the list, take a WRITE lock on the list. Search and find the object entry in the list. Take a WRITE lock on the object entry, this will ensure that you are the ONLY current user for the object. Now remove the entry from the listing, as no one can search it any more in the list. And release the object lock immediately. Then, release the list lock. Now destroy the object and release the object resources.