BACKGROUND -- We develop C++11
code and write unit tests using gtest/gmock. This is built on a Windows server using SCons and g++
in MinGW
. We started having occasional problems when executing unit tests: silent exits, expectation errors, exception pop-ups... with no obvious pattern or commonality and not easily reproduceable. Eventually, a colleague narrowed it down to a case when apparently a thread was joined without even starting to execute its payload function. In this case, there were no exceptions or alike. The test simply failed due to expectation not being met. I then made a further simpler test case involving neither our codebase nor gtest/gmock.
BRIEF QUESTION -- Consider the following code snippet:
bool flag(false);
std::thread worker( [&] () { flag = true; } );
worker.join();
assert(flag);
When executed once, this appears to work fine. By "once" I mean once in the test executable. This executable is then run repeatedly many times from a command file.
However, when executed repeatedly within the test itself, the above assertion would often fail; sometimes on the very second repetition, other times after many thousands repetitions.
It appears g++ std::thread
does not behave well under MinGW (4.8.0/32) -- Thread is successfully (i.e. no exceptions) created, it is joinable, and it can be joined. However, in some cases its payload function in never executed. -- I know MinGW does not have full POSIX pthreads and I already looked at Using threads with MinGW?, pthread_create not enough space, MinGW and std::thread, and alike to no avail. We do use static linking (for a different reason) and I also found https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57740.
It all kind of points to a race condition in thread implementation. Making both boolean flags in the test volatile
and turning optimization off (-O0
) made no difference.
We currently use g++ in 32-bit MinGW version 4.8.0 (out-of-the-box from QT5.1 installation) and are now considering move to a different toolchain (e.g. gcc/g++ on a Linux box) or at least upgrading to later MinGW if there is an indication that this problem might have been fixed.
Is this a known problem with std::thread on MinGW? Are there any fixes or work-arounds? (I mean general fixes. I already implemented some case-by-case- work-arounds that seem to work but I do not like them.)
FULL DETAILS -- Executing the code below we note that:
[A] Running the code below (so far) execution never fails test at #2. [Expected]
[B] However, test at #4 quite frequently fails (after different number of repetition, including just two(!) repetitions; though sometimes it takes thousands before the test fails). [Unexpected]
[C] Exclusively enabling wait at #1 results in more failures of condition at #4 (test at #2 still does not fail). [Unexpected]
[D] Exclusively enabling wait at #3 makes tests at both #2 and #4 succeed. [Hmm...]
With [D] "fix" and after many thousands repetition, I have seen (twice so far) dreaded R6016 (-not enough space for thread data). (In a way, this is understandable and perhaps not so worrying as long as thread resources are recovered periodically between tests and tests are not run back-to-back.)
Note that the "waits" at #1 and #3 are only to illustrate - they don't have time-out and could possibly hang.
#include <cassert>
#include <cstdio>
#include <cstdlib>
#include <thread>
int main(int, char *[])
{
bool flag1(false);
assert(not flag1);
std::thread worker1( [&] () { flag1 = true; } );
assert(worker1.joinable());
// while (not flag1) { std::this_thread::yield(); } // #1: MAKES #4 FAIL MORE OFTEN
worker1.join();
if (not flag1) // #2: DOES NOT FAIL
{
puts("Oops on first!");
exit(EXIT_FAILURE);
}
bool flag2(false);
assert(not flag2);
std::thread worker2( [&] () { flag2 = true; } );
assert(worker2.joinable());
// while (not flag2) { std::this_thread::yield(); } // #3: MAKES #4 SUCCEED
worker2.join();
if (not flag2) // #4: SOMETIMES FAILS
{
puts("Oops on second!");
exit(EXIT_FAILURE);
}
puts("Both OKAY");
return EXIT_SUCCESS;
}
Compiled into test.exe, the above test can be run repeatedly using:
@ECHO OFF
FOR /L %%i IN (1,1,1000000) DO (
ECHO __ %%i ________________________________________________________________________________ %%i __
test.exe
IF ERRORLEVEL 1 GOTO gameover
)
:gameover
EDIT
- As @TC pointed out, using bool is not correct. Originally, I used
atomic_bool
with the same behavior as described above. I then wrongly "simplified" the example to bool. - BTW using only
yield
without checking the flag at #1 and #3 is not sufficient.