Issue with std::thread when using g++ in 32-bit Mi

BACKGROUND -- We develop C++11 code and write unit tests using gtest/gmock. This is built on a Windows server using SCons and g++ in MinGW. We started having occasional problems when executing unit tests: silent exits, expectation errors, exception pop-ups... with no obvious pattern or commonality and not easily reproduceable. Eventually, a colleague narrowed it down to a case when apparently a thread was joined without even starting to execute its payload function. In this case, there were no exceptions or alike. The test simply failed due to expectation not being met. I then made a further simpler test case involving neither our codebase nor gtest/gmock.

BRIEF QUESTION -- Consider the following code snippet:

bool flag(false);
std::thread worker( [&] () { flag = true; } );
worker.join();
assert(flag);

When executed once, this appears to work fine. By "once" I mean once in the test executable. This executable is then run repeatedly many times from a command file.

However, when executed repeatedly within the test itself, the above assertion would often fail; sometimes on the very second repetition, other times after many thousands repetitions.

It appears g++ std::thread does not behave well under MinGW (4.8.0/32) -- Thread is successfully (i.e. no exceptions) created, it is joinable, and it can be joined. However, in some cases its payload function in never executed. -- I know MinGW does not have full POSIX pthreads and I already looked at Using threads with MinGW?, pthread_create not enough space, MinGW and std::thread, and alike to no avail. We do use static linking (for a different reason) and I also found https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57740.

It all kind of points to a race condition in thread implementation. Making both boolean flags in the test volatile and turning optimization off (-O0) made no difference.

We currently use g++ in 32-bit MinGW version 4.8.0 (out-of-the-box from QT5.1 installation) and are now considering move to a different toolchain (e.g. gcc/g++ on a Linux box) or at least upgrading to later MinGW if there is an indication that this problem might have been fixed.

Is this a known problem with std::thread on MinGW? Are there any fixes or work-arounds? (I mean general fixes. I already implemented some case-by-case- work-arounds that seem to work but I do not like them.)

FULL DETAILS -- Executing the code below we note that:

[A] Running the code below (so far) execution never fails test at #2. [Expected]

[B] However, test at #4 quite frequently fails (after different number of repetition, including just two(!) repetitions; though sometimes it takes thousands before the test fails). [Unexpected]

[C] Exclusively enabling wait at #1 results in more failures of condition at #4 (test at #2 still does not fail). [Unexpected]

[D] Exclusively enabling wait at #3 makes tests at both #2 and #4 succeed. [Hmm...]

With [D] "fix" and after many thousands repetition, I have seen (twice so far) dreaded R6016 (-not enough space for thread data). (In a way, this is understandable and perhaps not so worrying as long as thread resources are recovered periodically between tests and tests are not run back-to-back.)

Note that the "waits" at #1 and #3 are only to illustrate - they don't have time-out and could possibly hang.

#include <cassert>
#include <cstdio>
#include <cstdlib>
#include <thread>

int main(int, char *[])
{
   bool flag1(false);
   assert(not flag1);

   std::thread worker1( [&] () { flag1 = true; } );
   assert(worker1.joinable());

// while (not flag1) { std::this_thread::yield(); } // #1: MAKES #4 FAIL MORE OFTEN

   worker1.join();
   if (not flag1) // #2: DOES NOT FAIL
   {
      puts("Oops on first!");
      exit(EXIT_FAILURE);
   }

   bool flag2(false);
   assert(not flag2);

   std::thread worker2( [&] () { flag2 = true; } );
   assert(worker2.joinable());

// while (not flag2) { std::this_thread::yield(); } // #3: MAKES #4 SUCCEED

   worker2.join();
   if (not flag2) // #4: SOMETIMES FAILS
   {
      puts("Oops on second!");
      exit(EXIT_FAILURE);
   }

   puts("Both OKAY");
   return EXIT_SUCCESS;
}

Compiled into test.exe, the above test can be run repeatedly using:

@ECHO OFF
FOR /L %%i IN (1,1,1000000) DO (
   ECHO __ %%i ________________________________________________________________________________ %%i __
   test.exe
   IF ERRORLEVEL 1 GOTO gameover
)
:gameover

EDIT

As @TC pointed out, using bool is not correct. Originally, I used atomic_bool with the same behavior as described above. I then wrongly "simplified" the example to bool.
BTW using only yield without checking the flag at #1 and #3 is not sufficient.

Thanks a lot for the very detailed analysis and for the great example!
I've checked this example with x86_64-w64-mingw32-g++ (GCC) 4.8.2,
flags: -c -pipe -fno-keep-inline-dllexport -m64 -g -frtti -Wall -Wextra -fexceptions -mthreads
running under
Windows 7 with flag -std=c++0x
It failed rather early each time on second (loop iteration 293, 805, 1632, 276)

Windows 7 with flag -std=c++11
It failed rather early each time on second (loop iteration 4, 257, 613, 49)

Windows 10 with flag -std=c++0x
It failed after a long time (loop iteration 44924) on second.

Windows 10 with flag -std=c++11
It failed after a long time (loop iteration 7389, 41907) on second.

No optimizations where used. The tests where done in a VirtualBox with clean installations of Windows 7/10 without updates.
The test executables required the libraries:

libstdc++-6.dll
libwinpthread-1.dll
libgcc_s_sjlj-1.dll

So it is definitely much more stable under Windows 10, but not flawless.
Using c++11 instead of c++0x might be less stable under Windows7. But I've run too few tests to be certain about that.

Has anybody tried with a newer version of MinGW?