I'm using a common base class has_threads
to manage any type that should be allowed to instantiate a boost::thread
.
Instances of has_threads
each own a set
of thread
s (to support waitAll
and interruptAll
functions, which I do not include below), and should automatically invoke removeThread
when a thread terminates to maintain this set
's integrity.
In my program, I have just one of these. Threads are created on an interval every 10s, and each performs a database lookup. When the lookup is complete, the thread runs to completion and removeThread
should be invoked; with a mutex set, the thread object is removed from internal tracking. I can see this working properly with the output ABC
.
Once in a while, though, the mechanisms collide. removeThread
is executed perhaps twice concurrently. What I can't figure out is why this results in a deadlock. All thread invocations from this point never output anything other than A
. [It's worth noting that I'm using thread-safe stdlib, and that the issue remains when IOStreams are not used.] Stack traces indicate that the mutex is locking these threads, but why would the lock not be eventually released by the first thread for the second, then the second for the third, and so on?
Am I missing something fundamental about how scoped_lock
works? Is there anything obvious here that I've missed that could lead to a deadlock, despite (or even due to?) the use of a mutex lock?
Sorry for the poor question, but as I'm sure you're aware it's nigh-on impossible to present real testcases for bugs like this.
class has_threads {
protected:
template <typename Callable>
void createThread(Callable f, bool allowSignals)
{
boost::mutex::scoped_lock l(threads_lock);
// Create and run thread
boost::shared_ptr<boost::thread> t(new boost::thread());
// Track thread
threads.insert(t);
// Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler)
*t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals);
}
private:
/**
* Entrypoint function for a thread.
* Sets up the on-end handler then invokes the user-provided worker function.
*/
template <typename Callable>
void runThread(Callable f, bool allowSignals)
{
boost::this_thread::at_thread_exit(
boost::bind(
&has_threads::releaseThread,
this,
boost::this_thread::get_id()
)
);
if (!allowSignals)
blockSignalsInThisThread();
try {
f();
}
catch (boost::thread_interrupted& e) {
// Yes, we should catch this exception!
// Letting it bubble over is _potentially_ dangerous:
// http://stackoverflow.com/questions/6375121
std::cout << "Thread " << boost::this_thread::get_id() << " interrupted (and ended)." << std::endl;
}
catch (std::exception& e) {
std::cout << "Exception caught from thread " << boost::this_thread::get_id() << ": " << e.what() << std::endl;
}
catch (...) {
std::cout << "Unknown exception caught from thread " << boost::this_thread::get_id() << std::endl;
}
}
void has_threads::releaseThread(boost::thread::id thread_id)
{
std::cout << "A";
boost::mutex::scoped_lock l(threads_lock);
std::cout << "B";
for (threads_t::iterator it = threads.begin(), end = threads.end(); it != end; ++it) {
if ((*it)->get_id() != thread_id)
continue;
threads.erase(it);
break;
}
std::cout << "C";
}
void blockSignalsInThisThread()
{
sigset_t signal_set;
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGINT);
sigaddset(&signal_set, SIGTERM);
sigaddset(&signal_set, SIGHUP);
sigaddset(&signal_set, SIGPIPE); // http://www.unixguide.net/network/socketfaq/2.19.shtml
pthread_sigmask(SIG_BLOCK, &signal_set, NULL);
}
typedef std::set<boost::shared_ptr<boost::thread> > threads_t;
threads_t threads;
boost::mutex threads_lock;
};
struct some_component : has_threads {
some_component() {
// set a scheduler to invoke createThread(bind(&some_work, this)) every 10s
}
void some_work() {
// usually pretty quick, but I guess sometimes it could take >= 10s
}
};