pthread_mutex_lock __pthread_mutex_lock_full: Asse

2019-05-10 13:33发布

问题:

I'm working on a server-side project, which is supposed to accept more than 100 client connections.

It's multithreaded program using boost::thread. Some places I'm using boost::lock_guard<boost::mutex> to lock the shared member data. There is also a BlockingQueue<ConnectionPtr> which contains the input connections. The implementation of the BlockingQueue:

template <typename DataType>
class BlockingQueue : private boost::noncopyable
{
public:
    BlockingQueue()
        : nblocked(0), stopped(false)
    {

    }

    ~BlockingQueue()
    {
        Stop(true);
    }

    void Push(const DataType& item)
    {
        boost::mutex::scoped_lock lock(mutex);
        queue.push(item);
        lock.unlock();
        cond.notify_one(); // cond.notify_all();
    }

    bool Empty() const
    {
        boost::mutex::scoped_lock lock(mutex);
        return queue.empty();
    }

    std::size_t Count() const
    {
        boost::mutex::scoped_lock lock(mutex);
        return queue.size();
    }

    bool TryPop(DataType& poppedItem)
    {
        boost::mutex::scoped_lock lock(mutex);
        if (queue.empty())
            return false;

        poppedItem = queue.front();
        queue.pop();

        return true;
    }

    DataType WaitPop()
    {
        boost::mutex::scoped_lock lock(mutex);

        ++nblocked;
        while (!stopped && queue.empty()) // Or: if (queue.empty())
            cond.wait(lock);
        --nblocked;

        if (stopped)
        {
            cond.notify_all(); // Tell Stop() that this thread has left
            BOOST_THROW_EXCEPTION(BlockingQueueTerminatedException());
        }

        DataType tmp(queue.front());
        queue.pop();

        return tmp;
    }

    void Stop(bool wait)
    {
        boost::mutex::scoped_lock lock(mutex);
        stopped = true;
        cond.notify_all();

        if (wait) // Wait till all blocked threads on the waiting queue to leave BlockingQueue::WaitPop()
        {
            while (nblocked)
                cond.wait(lock);
        }
    }

private:
    std::queue<DataType>          queue;
    mutable boost::mutex          mutex;
    boost::condition_variable_any cond;
    unsigned int                  nblocked;
    bool                          stopped;
};

For each Connection, there is a ConcurrentQueue<StreamPtr>, which contains the input Streams. The implementation of the ConcurrentQueue:

template <typename DataType>
class ConcurrentQueue : private boost::noncopyable
{
public:
    void Push(const DataType& item)
    {
        boost::mutex::scoped_lock lock(mutex);
        queue.push(item);
    }

    bool Empty() const
    {
        boost::mutex::scoped_lock lock(mutex);
        return queue.empty();
    }

    bool TryPop(DataType& poppedItem)
    {
        boost::mutex::scoped_lock lock(mutex);
        if (queue.empty())
            return false;

        poppedItem = queue.front();
        queue.pop();

        return true;
    }
private:
    std::queue<DataType> queue;
    mutable boost::mutex mutex;
};

When debugging the program, it's okay. But in a load testing with 50 or 100 or more client connections, sometimes it aborted with

pthread_mutex_lock.c:321: __pthread_mutex_lock_full: Assertion `robust || (oldval & 0x40000000) == 0' failed.

I have no idea what happened, and it cannot be reproduced every time.

I googled a lot, but no luck. Please advise.

Thanks.

Peter

回答1:

0x40000000 is FUTEX_OWNER_DIED - which has the following docs in the futex.h header:

/*
 * The kernel signals via this bit that a thread holding a futex
 * has exited without unlocking the futex. The kernel also does
 * a FUTEX_WAKE on such futexes, after setting the bit, to wake
 * up any possible waiters:
 */
#define FUTEX_OWNER_DIED        0x40000000

So the assertion seems to be an indication that a thread that's holding the lock is exiting for some reason - is there a way tha a thread object might be destroyed while it's holding a lock?

Another thing to check is if you have some sort of memory corruption somewhere. Valgrind might be a tool that can help you with that.