graceful thread termination with pthread_cond_sign

2019-07-20 16:14发布

问题:

I need to fire of a bunch of threads and would like to bring them down gracefully.

I'm trying to use pthread_cond_signal/pthread_cond_wait to achieve this but am running into a problem.

Here's my code. firstly the thread_main

static void *thrmain( void * arg )
{
    // acquire references to the cond var, mutex, finished flag and
    // message queue
    .....

    while( true )
    {
        pthread_mutex_lock( &lock );

        if ( msq.empty() )
        {
            // no messages so wait for one.
            pthread_cond_wait( &cnd, &lock );
        }

        // are we finished.
        if ( finished )
        {
            // finished so unlock the mutex and get out of here
            pthread_mutex_unlock( &lock );
            break;
        }

        if ( !msg.empty() )
        {
            // retrieve msg
            ....

            // finished with lock
            pthread_mutex_unlock( &lock );

            // perform action based on msg
            // outside of lock to avoid deadlock
        }
        else
        {
            // nothing to do so we're
            // finished with the lock.
            pthread_mutex_unlock( &lock );
        }
    }

    return 0;
}

Now, this all looks fine and dandy (to me anyway).

So to tear down the threads I have this method

void teardown()
{
    // set the global finished var
    pthread_mutex_lock( &lock );
    finished = true;
    pthread_mutex_unlock( &lock );

    // loop over the threads, signalling them
    for ( int i = 0 ; i < threads.size() ; ++i )
    {
        // send a signal per thread to wake it up
        // and get it to check it's finished flag
        pthread_cond_signal( &cnd );
    }

    // need to loop over the threads and join them.
    for ( int i = 0 ; i < threads.size() ; ++i )
    {
        pthread_join( threads[ i ].tid, NULL );
    }
}

Now I know that pthread_cond_signal doesn't guarantee which thread it wakes up so I can't signal and join in the same loop. However, this is where it's all going wrong. pthread_cond_signal does nothing if there is no thread waiting so potentially some of the threads won't have been signalled and therefore won't know to exit.

How do I over come this.

M.

***** UPDATE ******* Please don't post that I should use pthread_cond_broadcast as this exhibits EXACTLY THE SAME BEHAVIOUR. it will only wake up a thread which is actually waiting on the cond var. Any thread that is processing during this time and comes back to wait later will have missed the signal and will be oblivious


回答1:

Firstly, you have to change your predicate from

if ( msq.empty() ) {
  // no messages so wait for one.
  pthread_cond_wait( &cnd, &lock );
}

to

while ( msq.empty() ) {
  // no messages so wait for one.
  pthread_cond_wait( &cnd, &lock );
}

That's a pthreads thing, you have to guard yourself against spurious wakeups.

Now you can change that to

while ( msq.empty()  && !finished) {
  // no messages so wait for one.
  pthread_cond_wait( &cnd, &lock );
}

Since after that check, you already test whether finished is set and exits if so, all you have to do is signal all the threads.

So, in your teardown function, replace the loop with:

pthread_cond_broadcast(&cond);

That should ensure all threads wake up, and will see finished set to true and exit.

This is safe even if your threads are not stuck in pthread_cond_wait. If the threads are processing a message, they will not get the wakeup signal, however they will finish that processing, enter the loop again and see that finished == false and exit.

Another common pattern for this is to inject a poison message. A poison message is simply a special message your thread could recognise that would mean "STOP", you would place as many of those messages in your queue as you have threads.



回答2:

I guess you should be unlocking the mutex after the call to pthread_cond_signal. Also, please check the condition of "finished" before you enter into conditional wait after acquiring the mutex. Hope this helps!



回答3:

You want to use pthread_cond_broadcast() instead of pthread_cond_signal(). The former unblocks all threads waiting on a given condition.



回答4:

I have never used pthreads directly (I prefer Boost.Threads), but I think you should be calling pthread_cancel instead of pthread_cond_signal.