At which level does one unit test lock-free code?

Can LLVM, QEMU, GDB, Bochs, OpenStack or the like be used to unit test lock-free concurrent code on an open-source platform? Has anyone achieved this?

If you answer by recommending software, I don't mind, but I mention LLVM, QEMU and the others because these function at various different levels. I should like to learn at which level practical success has been found at interleaving threads under unit-test control.

I am aware of SPIN/Promela, incidentally. That is fine software but one cannot compile C++, Rust, etc., onto a SPIN/Promela target as far as I know.

Examples of existing, open-source unit tests of lock-free concurrent code would be gladly received, if you know any. (I would fetch the source and study it if I knew where to look.)

(See also these questions and their answers.)

EXAMPLE

My question does not require an example as far as I know, so you can ignore this one. However, in case an example of testable lock-free code were helpful for purpose of discussion, here is a relatively brief toy example in C++. I have no unit test for it.

#include <atomic>
#include <thread>
#include <cstdlib>
#include <iostream>

const int threshold     =  0x100;
const int large_integer = 0x1000;

// Gradually increase the integer to which q points until it reaches the
// threshold.  Then, release.
void inflate(std::atomic_bool *const p_atom, int *const q)
{
    while (*q < threshold) ++*q;
    p_atom->store(true, std::memory_order_release);
}

int main()
{
    std::atomic_bool atom{false};
    int n{0};

    // Dispatch the inflator, letting it begin gradually, in the background, to
    // inflate the integer n.
    std::thread inflator(inflate, &atom, &n);

    // Waste some time....
    for (int i = large_integer; i; --i) {}

    // Spin until the inflator has released.
    {
        int no_of_tries = 0;
        while (!atom.load(std::memory_order_acquire)) ++no_of_tries;
        std::cout << "tried " << no_of_tries << " times" << std::endl;
    }

    // Verify that the integer n has reached the threshold.
    if (n == threshold) {
        std::cout << "succeeded" << std::endl;
    }
    else {
        std::cout << "failed" << std::endl;
        std::cerr << "error"  << std::endl;
        std::exit(1);
    }

    inflator.join();
    return 0;
}

CLARIFICATION BY PETER CORDES

@PeterCordes precisely clarifies my question:

There can be cases where some source compiles to safe x86 asm with any reasonable compiler, but unsafe for weakly-ordered ISAs, which are also usually capable of performing an atomic RMW without a full seq-cst memory barrier (for run-time reordering; compile-time is still up to the compiler). So then you have two separate questions: Is the source portable to arbitrary C++11 systems, and is your code actually safe on x86 (if that's all you care about for now).

Both questions are interesting to me, but I had arbitrary C++11 systems in mind.

Usually you want to write code that's portably correct, because it usually doesn't cost any more when compiled for x86.

Reference: the draft C++17 standard, n4659 (6 MB PDF), well explains the C++11 concurrency model to which Peter refers. See sect. 4.7.1.

INQUIRY BY DIRK HERRMANN

@DirkHerrmann asks a pertinent question:

You ask about how to unit-test your code, but I am not sure that what you describe is truly a unit-testing scenario. Which does not mean you could not use any of the so-called unit-testing frameworks (which can in fact be used for all kinds of tests, not just unit-tests). Could you please explain what the goal of your tests would be, that is, which properties of the code you want to check?

Your point is well taken. The goal of my test would be to flunk bad code reliably for all possible timings the C++11 concurrency model supports. If I know that the code is bad, then I should be able to compose a unit test to flunk it. My trouble is this:

Unthreaded. I can normally compose a unit test to flunk bad code if the code is unthreaded.
Threaded. To flunk bad, threaded code is harder, but as long as mutexes coordinate the threading, at least the code runs similarly on divergent hardware.
Lock-free. To flunk bad, lock-free code might be impossible on particular hardware. What if my bad, lock-free code fails once in a billion runs on your hardware and never fails on mine? How can one unit test such code?

I don't know what I need, really. Insofar as my x86 CPU does not provide a true C++11 concurrency model, maybe I need an emulator for a nonexistent CPU that does provide a true C++11 concurrency model. I am not sure.

If I did have an emulator for a nonexistent CPU that provided a true C++11 concurrency model, then my unit test would (as far as I know) need to try my code under all possible, legal timings.

This is not an easy problem. I wonder whether anyone has solved it.

UPDATE: CDSCHECKER AND RELACY

The discussion has led me to investigate various sources, including

CDSChecker, open-source software by Norris and Demsky; and
Relacy Race Detector, open-source software by Vyukov, earlier discussed here.

At this writing, I do not know whether these answer my question but they look promising. I link them here for reference and further investigation.

For completeness of reference, I also add

SPIN/Promela,

already linked above.

Interesting question!

At which level does one unit test lock-free code?

The unsatisfying answer is: You cannot really test "lock-free concurrent code" as you called it.

No wait, of course you can: Test it by using a single piece of paper and a pen. Try to prove it is correct. The design level is the correct level to test multithreaded code.

Of course you can write unit tests for your code, and you really should do that, but there are virtually no means to achieve 100% coverage for all possible concurrent execution scenarios.

You can (and should) try to torment your code, run it on different architectures (e.g. x86 is so coherent, it will hide many concurrency problems. Run it on ARM in addition.). And you will still fail to find all errors.

The fundamental rule is: You cannot use testing to assure any quality level of multithreaded code (lock-free or also with locks). The only way to 100% assure correctness is to formally prove the correctness of your code, and this usually means you have a very simple thread design, which is so obvious that everybody understands it within 5 minutes. And then you write your code accordingly.

Don't get me wrong: Testing is useful. But it gets you nowhere with multithreading.

Why is this? Well, first of all the unit test approach does not work: Mutexes do not compose.

When you combine 100% correctly working multithreaded subsystems A and B, the result is not guaranteed to work at all. Mutexes do not compose. Condition variables do not compose. Invariants between threads do not compose. There is only very few and very limited primitives, like thread-safe queues, which compose. But testing aspects in isolation in unit tests assumes that things compose, like functions or classes.