how does the performance of std::mutex
compared to CRITICAL_SECTION
? is it on par?
I need lightweight synchronization object (doesn't need to be an interprocess object) is there any STL class that close to CRITICAL_SECTION
other than std::mutex
?
I'm using Visual Studio 2013.
My results in single threaded usage are looking similar to waldez results:
1 million of lock/unlock calls:
The reason why Microsoft changed implementation is C++11 compatibility. C++11 has 4 kind of mutexes in std namespace:
Microsoft std::mutex and all other mutexes are the wrappers around critical section:
As for me, std::recursive_mutex should completely match critical section. So Microsoft should optimize its implementation to take less CPU and memory.
Please see my updates at the end of the answer, the situation has dramatically changed since Visual Studio 2015. The original answer is below.
I made a very simple test and according to my measurements the
std::mutex
is around 50-70x slower thanCRITICAL_SECTION
.Edit: After some more tests it turned out it depends on number of threads (congestion) and number of CPU cores. Generally, the
std::mutex
is slower, but how much, it depends on use. Following are updated test results (tested on MacBook Pro with Core i5-4258U, Windows 10, Bootcamp):Following is the code that produced this output. Compiled with Visual Studio 2012, default project settings, Win32 release configuration. Please note that this test may not be perfectly correct but it made me think twice before switching my code from using
CRITICAL_SECTION
tostd::mutex
.Update 10/27/2017 (1): Some answers suggest that this is not a realistic test or does not represent a "real world" scenario. That's true, this test tries to measure the overhead of the
std::mutex
, it's not trying to prove that the difference is negligible for 99% of applications.Update 10/27/2017 (2): Seems like the situation has changed in favor for
std::mutex
since Visual Studio 2015 (VC140). I used VS2017 IDE, exactly the same code as above, x64 release configuration, optimizations disabled and I simply switched the "Platform Toolset" for each test. The results are very surprising and I am really curious what has hanged in VC140.Same test program by Waldez modified to run with pthreads and boost::mutex.
On win10 pro (with intel i7-7820X 16-core cpu) I get better results from std::mutex on VS2015 update3 (and even better from boost::mutex) that from CRITICAL_SECTION:
Results for pthreads are here.
The test by waldez here is not realistic, it basically simulates 100% contention. In general this is exactly what you don't want in multi-threaded code. Below is a modified test which does some shared calculations. The results I get with this code are different:
You can see here that for me (using VS2013) the figures are very close between std::mutex and CRITICAL_SECTION. Note that this code does a fixed number of tasks (160,000) which is why the performance improves generally with more threads. I've got 12 cores here so that's why I stopped at 12.
I'm not saying this is right or wrong compared to the other test but it does highlight that timing issues are generally domain specific.
I was searching here for pthread vs critical section benchmarks, however, as my result turned out to be different from the waldez's answer with regard to the topic, I thought it'd be interesting to share.
The code is the one used by @waldez, modified to add pthreads to the comparison, compiled with GCC and no optimizations. My CPU is AMD A8-3530MX.
Windows 7 Home Edition:
As you can see, the difference varies well within statistical error — sometimes std::mutex is faster, sometimes it's not. What's important, I do not observe such big difference as the original answer.
I think, maybe the reason is that when the answer was posted, MSVC compiler wasn't good with newer standards, and note that the original answer have used the version from 2012 year.
Also, out of curiosity, same binary under Wine on Archlinux:
The waldez's code with my modifications: