This question is based on:
When is it safe to destroy a pthread barrier?
and the recent glibc bug report:
http://sourceware.org/bugzilla/show_bug.cgi?id=12674
I'm not sure about the semaphores issue reported in glibc, but presumably it's supposed to be valid to destroy a barrier as soon as pthread_barrier_wait
returns, as per the above linked question. (Normally, the thread that got PTHREAD_BARRIER_SERIAL_THREAD
, or a "special" thread that already considered itself "responsible" for the barrier object, would be the one to destroy it.) The main use case I can think of is when a barrier is used to synchronize a new thread's use of data on the creating thread's stack, preventing the creating thread from returning until the new thread gets to use the data; other barriers probably have a lifetime equal to that of the whole program, or controlled by some other synchronization object.
In any case, how can an implementation ensure that destruction of the barrier (and possibly even unmapping of the memory it resides in) is safe as soon as pthread_barrier_wait
returns in any thread? It seems the other threads that have not yet returned would need to examine at least some part of the barrier object to finish their work and return, much like how, in the glibc bug report cited above, sem_post
has to examine the waiters count after having adjusted the semaphore value.
I'm going to take another crack at this with an example implementation of
pthread_barrier_wait()
that uses mutex and condition variable functionality as might be provided by a pthreads implementation. Note that this example doesn't try to deal with performance considerations (specifically, when the waiting threads are unblocked, they are all re-serialized when exiting the wait). I think that using something like Linux Futex objects could help with the performance issues, but Futexes are still pretty much out of my experience.Also, I doubt that this example handles signals or errors correctly (if at all in the case of signals). But I think proper support for those things can be added as an exercise for the reader.
My main fear is that the example may have a race condition or deadlock (the mutex handling is more complex than I like). Also note that it is an example that hasn't even been compiled. Treat it as pseudo-code. Also keep in mind that my experience is mainly in Windows - I'm tackling this more as an educational opportunity than anything else. So the quality of the pseudo-code may well be pretty low.
However, disclaimers aside, I think it may give an idea of how the problem asked in the question could be handled (ie., how can the
pthread_barrier_wait()
function allow thepthread_barrier_t
object it uses to be destroyed by any of the released threads without danger of using the barrier object by one or more threads on their way out).Here goes:
17 July 20111: Update in response to a comment/question about process-shared barriers
I forgot completely about the situation with barriers that are shared between processes. And as you mention, the idea I outlined will fail horribly in that case. I don't really have experience with POSIX shared memory use, so any suggestions I make should be tempered with scepticism.
To summarize (for my benefit, if no one else's):
When any of the threads gets control after
pthread_barrier_wait()
returns, the barrier object needs to be in the 'init' state (however, the most recentpthread_barrier_init()
on that object set it). Also implied by the API is that once any of the threads return, one or more of the the following things could occur:pthread_barrier_wait()
to start a new round of synchronization of threadspthread_barrier_destroy()
on the barrier objectThese things mean that before the
pthread_barrier_wait()
call allows any thread to return, it pretty much needs to ensure that all waiting threads are no longer using the barrier object in the context of that call. My first answer addressed this by creating a 'local' set of synchronization objects (a mutex and an associated condition variable) outside of the barrier object that would block all the threads. These local synchronization objects were allocated on the stack of the thread that happened to callpthread_barrier_wait()
first.I think that something similar would need to be done for barriers that are process-shared. However, in that case simply allocating those sync objects on a thread's stack isn't adequate (since the other processes would have no access). For a process-shared barrier, those objects would have to be allocated in process-shared memory. I think the technique I listed above could be applied similarly:
waitdata_mutex
that controls the 'allocation' of the local sync variables (the waitdata block) would be in process-shared memory already by virtue of it being in the barrier struct. Of course, when the barrier is set toTHEAD_PROCESS_SHARED
, that attribute would also need to be applied to thewaitdata_mutex
__barrier_waitdata_init()
is called to initialize the local mutex & condition variable, it would have to allocate those objects in shared memory instead of simply using the stack-basedwaitdata
variable.waitdata
block, it would also need to clean up the process-shared memory allocation for the block.I think these changes would allow the scheme to operate with process-shared barriers. the last bullet point above is a key item to figure out. Another is how to construct a name for the shared memory object that will hold the 'local' process-shared
waitdata
. There are certain attributes you'd want for that name:struct pthread_barrier_t
structure so all process have access to it; that means a known limit to the length of the namepthread_barrier_wait()
because it might be possible for a second round of waiting to start before all threads have gotten all the way out of the first round waiting (so the process-shared memory block set up for thewaitdata
might not have been freed yet). So the name probably has to be based on things like process id, thread id, address of the barrier object, and an atomic counter.As far as I can see there is no need for
pthread_barrier_destroy
to be an immediate operation. You could have it wait until all threads that are still in their wakeup phase are woken up.E.g you could have an atomic counter
awakening
that initially set to the number of threads that are woken up. Then it would be decremented as last action beforepthread_barrier_wait
returns.pthread_barrier_destroy
then just could be spinning until that counter falls to0
.