I need to fire of a bunch of threads and would like to bring them down gracefully.
I'm trying to use pthread_cond_signal
/pthread_cond_wait
to achieve this but am running into a problem.
Here's my code. firstly the thread_main
static void *thrmain( void * arg )
{
// acquire references to the cond var, mutex, finished flag and
// message queue
.....
while( true )
{
pthread_mutex_lock( &lock );
if ( msq.empty() )
{
// no messages so wait for one.
pthread_cond_wait( &cnd, &lock );
}
// are we finished.
if ( finished )
{
// finished so unlock the mutex and get out of here
pthread_mutex_unlock( &lock );
break;
}
if ( !msg.empty() )
{
// retrieve msg
....
// finished with lock
pthread_mutex_unlock( &lock );
// perform action based on msg
// outside of lock to avoid deadlock
}
else
{
// nothing to do so we're
// finished with the lock.
pthread_mutex_unlock( &lock );
}
}
return 0;
}
Now, this all looks fine and dandy (to me anyway).
So to tear down the threads I have this method
void teardown()
{
// set the global finished var
pthread_mutex_lock( &lock );
finished = true;
pthread_mutex_unlock( &lock );
// loop over the threads, signalling them
for ( int i = 0 ; i < threads.size() ; ++i )
{
// send a signal per thread to wake it up
// and get it to check it's finished flag
pthread_cond_signal( &cnd );
}
// need to loop over the threads and join them.
for ( int i = 0 ; i < threads.size() ; ++i )
{
pthread_join( threads[ i ].tid, NULL );
}
}
Now I know that pthread_cond_signal
doesn't guarantee which thread it wakes up so I can't signal and join in the same loop. However, this is where it's all going wrong. pthread_cond_signal
does nothing if there is no thread waiting so potentially some of the threads won't have been signalled and therefore won't know to exit.
How do I over come this.
M.
***** UPDATE ******* Please don't post that I should use pthread_cond_broadcast as this exhibits EXACTLY THE SAME BEHAVIOUR. it will only wake up a thread which is actually waiting on the cond var. Any thread that is processing during this time and comes back to wait later will have missed the signal and will be oblivious