Massive CPU load using std::lock (c++11)

2020-02-02 04:38发布

问题:

My recent efforts to implement a thread/ mutex manager ended up in an 75% CPU load (4 core), while all four running threads were either in sleep or waiting for a mutex beeing unlocked.

The specific class is far too large for being posted here entirely, but I could narrow down the cause to the deadlock-safe acquiring of two mutexes

std::unique_lock<std::mutex> lock1( mutex1, std::defer_lock );
std::unique_lock<std::mutex> lock2( mutex2, std::defer_lock );
std::lock( lock1, lock2 );

Another part of the class uses a std::condition_variable with wait() and notify_one() on mutex1 for some code to be executed selectively at the same time.

The simple change to

std::unique_lock<std::mutex> lock1( mutex1 );
std::unique_lock<std::mutex> lock2( mutex2 );

brought the CPU usage down to normal 1-2%.

I Cant believe, the std::lock() function is that inefficient. Could this be a bug in g++ 4.6.3?

edit: ( example )

#include <atomic>
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>

std::mutex mutex1, mutex2;
std::condition_variable cond_var;

bool cond = false;
std::atomic<bool>done{false};

using namespace std::chrono_literals;

void Take_Locks()
    {
    while( !done )
        {
        std::this_thread::sleep_for( 1s );

        std::unique_lock<std::mutex> lock1( mutex1, std::defer_lock );
        std::unique_lock<std::mutex> lock2( mutex2, std::defer_lock );
        std::lock( lock1, lock2 );

        std::this_thread::sleep_for( 1s );
        lock1.unlock();
        lock2.unlock();
        }
    }

void Conditional_Code()
    {
    std::unique_lock<std::mutex> lock1( mutex1, std::defer_lock );
    std::unique_lock<std::mutex> lock2( mutex2, std::defer_lock );

    std::lock( lock1, lock2 );
    std::cout << "t4: waiting \n";

    while( !cond )
        cond_var.wait( lock1 );

    std::cout << "t4: condition met \n";
    }

int main()
    {
    std::thread t1( Take_Locks ), t2( Take_Locks ), t3( Take_Locks );
    std::thread t4( Conditional_Code );

    std::cout << "threads started \n";
    std::this_thread::sleep_for( 10s );

    std::unique_lock<std::mutex> lock1( mutex1 );
    std::cout << "mutex1 locked \n" ;
    std::this_thread::sleep_for( 5s );

    std::cout << "setting condition/notify \n";
    cond = true;
    cond_var.notify_one();
    std::this_thread::sleep_for( 5s );

    lock1.unlock();
    std::cout << "mutex1 unlocked \n";
    std::this_thread::sleep_for( 6s );

    done = true;
    t4.join(); t3.join(); t2.join(); t1.join();
    }

回答1:

On my machine, the following code prints out 10 times a second and consumes almost 0 cpu because most of the time the thread is either sleeping or blocked on a locked mutex:

#include <chrono>
#include <thread>
#include <mutex>
#include <iostream>

using namespace std::chrono_literals;

std::mutex m1;
std::mutex m2;

void
f1()
{
    while (true)
    {
        std::unique_lock<std::mutex> l1(m1, std::defer_lock);
        std::unique_lock<std::mutex> l2(m2, std::defer_lock);
        std::lock(l1, l2);
        std::cout << "f1 has the two locks\n";
        std::this_thread::sleep_for(100ms);
    }
}

void
f2()
{
    while (true)
    {
        std::unique_lock<std::mutex> l2(m2, std::defer_lock);
        std::unique_lock<std::mutex> l1(m1, std::defer_lock);
        std::lock(l2, l1);
        std::cout << "f2 has the two locks\n";
        std::this_thread::sleep_for(100ms);
    }
}

int main()
{
    std::thread t1(f1);
    std::thread t2(f2);
    t1.join();
    t2.join();
}

Sample output:

f1 has the two locks
f2 has the two locks
f1 has the two locks
...

I'm running this on OS X and the Activity Monitor application says that this process is using 0.1% cpu. The machine is a Intel Core i5 (4 core).

I'm happy to adjust this experiment in any way to attempt to create live-lock or excessive cpu usage.

Update

If this program is using excessive CPU on your platform, try changing it to call ::lock() instead, where that is defined with:

template <class L0, class L1>
void
lock(L0& l0, L1& l1)
{
    while (true)
    {
        {
            std::unique_lock<L0> u0(l0);
            if (l1.try_lock())
            {
                u0.release();
                break;
            }
        }
        std::this_thread::yield();
        {
            std::unique_lock<L1> u1(l1);
            if (l0.try_lock())
            {
                u1.release();
                break;
            }
        }
        std::this_thread::yield();
    }
}

I would be interested to know if that made any difference for you, thanks.

Update 2

After a long delay, I have written a first draft of a paper on this subject. The paper compares 4 different ways of getting this job done. It contains software you can copy and paste into your own code and test yourself (and please report back what you find!):

http://howardhinnant.github.io/dining_philosophers.html



回答2:

As the documentation says, [t]he objects are locked by an unspecified series of calls to lock, try_lock, unlock. There is simply no way that can possibly be efficient if the mutexes are held by other threads for a significant period of time. There's no way the function can wait without spinning.



回答3:

The std::lock() non-member function may cause Live-lock problem or performance degradation, it guarantee only "Never Dead-lock".

If you can determine "Lock order(Lock hierarchy)" of multiple mutexes by design, it's preferable to not use generic std::lock() but lock each mutexes in pre-determinate order.

Refer to Acquiring Multiple Locks Without Deadlock for more detail.



回答4:

First I want to thank for all answers.

During the work on an example code, that reproduces the effect, I found the source of trouble.

The conditional part locks both mutexes, while it uses just one for the std::condition_variable::wait() function.

But I still wonder, what going on behind the scene, that produces such a high CPU load.