What can cause an exception 16: “mutex: Resource b

2019-07-22 05:56发布

问题:

I've ported a long-working stable library written in C++ and Boost to Blackberry 10. The library transfers files between devices. The library compiles and links well, and runs just fine. However, I consistently encounter a thrown exception on my Blackberry 10 device after 1, 2, or 3 files have been transferred. Catching the exception as a boost::system::system_error in the source code shows it is exception 16, with a text of "mutex: Resource busy".

Here is the source code where the exception occurs:

try
{
    . . .

    // Find DtpFunctionData for the operation ID, use it to invoke handling function
    std::map<int, FunctionData>::iterator iter = _vecFunctionData.find (operationId);
    if (iter == _vecDtpClientFunctionData.end ())
        return EC_GENERAL_FAILURE;

    HANDLINGFUNC_1 handlingFunc = (*iter).second._clientHandlingFunc;
    POSTOPFUNC_1 postOpFunc = (*iter).second._clientPostOpFunc;
    bool callPostOpOnSuccess = (*iter).second._callPostOpOnSuccess;

    // Open a socket opposite the remote peer's TcpPortListener
    /* Start: ----- EXCEPTION 16: "mutex: Resource busy" ----- */
    boost::asio::io_service io_service;
    /* End: ----- EXCEPTION 16: "mutex: Resource busy" ----- */

    boost::asio::ip::tcp::socket socket (io_service);
    . . .
}
catch (boost::system::system_error& err)
{
    LOGLINE (("error", "Boost exception (%d / \"%s\") caught in HandleQueueOperation",  err.code ().value(), err.what()));
       return EC_EXCEPTION_CAUGHT;
}

The trace log line is:

18:37:04 ( 149077264) [error] Boost exception (16 / "mutex: Resource busy") caught in HandleQueueOperation

The exception is thrown somewhere between the "start" and "end" comments above, where the boost::asio::io_service object is defined. I've searched StackOverflow, Google, etc. for anything related to "mutex: Resource busy" but have found nothing. My code is not accessing any app-level mutexes at this point, so I assume the mutex referred to is a Boost-related one.

Can someone tell me what the message basically means, and why the "resource busy" exception is being thrown? Is there a known issue on Blackberry 10 related to the exception?

Thanks in advance!

回答1:

After much debugging a colleague finally solved the problem.

Executive summary

The exception was being thrown by pthread_mutex_init () after 55-65 boost::mutex constructor invocations because an application-level derived class object, having a boost::mutex as a member variable, was not fully destructed because the base class destructor was non-virtual. This caused the number of boost::mutex-s to rise until the mutex exception was thrown. When the derived class's destructor was correctly invoked the mutex exceptions were no longer thrown.

Relevant / interesting facts gleaned along the way

(1) An early theory was put forward that there were too many mutexes in the system and the application was exceeding some unknown restriction on the maximum number of synchronization objects allowed (although the QNX documentation clearly states the number of such objects is unlimited). To test this we modified the boost::mutex class from:

class mutex
{
private:
    . . .
public:
    mutex()
    {
        . . .
    }
    ~mutex()
    {
        . . .
    }
}

to:

class mutex
{
private:
    static int _nCount;
public:
    mutex()
    {
        ++_nCount;
        . . .
    }
    ~mutex()
    {
        . . .
        --_nCount;
    }
    static int getCount ()
    {
        return _nCount;
    }
    . . .
}

Note that access to the _nCount variable is not synchronized (we'd need a mutex object for this!), but calling the debugging boost::mutex::getCount() function from the application gave us the confidence that the number of mutexes was low at the time of the exception (55-65 active mutexes on average).

This technique of monitoring an object at the lowest level (e.g., mutexes within Boost) by adding static access functions is a good tool to consider when debugging sticky problems.

(2) We occasionally received an ENOMEM exception, indicating a memory problem ("the system cannot allocate the resources required to create the mutex").

(3) A FreeBSD site posting from three months ago was remarkably similar to our symptoms:

I'm having troubles that I seem unable to resolve. My programs creates and destroys mutexes repeatably (and, obviously, uses them in between). Around the 60th lock created, I always get ENOMEM. I have free memory, lots of it. All locks get released properly.

Unfortunately the thread did not point us in a constructive direction.

(4) The breakthrough came when careful studying of the application's code found a derived object whose base class destructor was non-virtual, thereby leaking some memory. Making the base class destructor virtual fixed the memory leak and solved the mutex exceptions.

(5) Even after making the base class's destructor virtual we found that the derived class's destructor was not being called when compiling for Blackberry 10 using the QNX® Momentics Tool Suite. We "hacked" this problem by specifying both the base and the derived destructors as virtual. Only then did the derived destructor get called. This may indicate an error in the QNX compiler's implementation of the C++ specification which states clearly that virtual-ness propagates to derived classes (Working Draft, Standard for Programming Language C++ (2012), page 250, footnote 9).

Edit: See this Stack Overflow post for another example of QNX's dropping the ball regarding virtual destructors.