Is lockless hashing without std::atomics guarantee

2019-07-22 01:58发布

问题:

Consider the following attempt at a lockless hashtable for multithreaded search algorithms (inspired by this paper)

struct Data
{
    uint64_t key;
    uint64_t value;
};

struct HashEntry
{
    uint64_t key_xor_value;
    uint64_t value;
};

void insert_data(Data const& e, HashEntry* h, std::size_t tableOffset)
{
    h[tableOffset].key_xor_value = e.key ^ e.value;
    h[tableOffset].value = e.value;
}

bool data_is_present(Data const& e, HashEntry const* h, std::size_t tableOffset)
{
    auto const tmp_key_xor_value = h[tableOFfset].key_xor_value;
    auto const tmp_value = h[tableOffset].value;

    return e.key == (tmp_key_xor_value ^ tmp_value);
}

The idea is that a HashEntry struct stores the XOR-ed combination of the two 64-bit words of a Data struct. If two threads have interleaved reads/writes to the two 64-bit words of a HashEntry struct, the idea is that this can be detected by the reading thread by XOR-ing again and comparing against the original key. So one might have a loss of efficiency by corrupted hash entries, but still have guaranteed correctness in case the decoded retrieved key matches the original.

The paper mentions that it is based on the following assumption:

For the remainder of this discussion, assume that 64 bit memory read/write operations are atomic, that is the entire 64 bit value is read/written in one cycle.

My questions are: is the above code without explicit use of std::atomic<uint64_t> guaranteed to be thread-safe in C++11? Or can the individual 64-bit words be corrupted by simultaneous reads/writes? Even on 64-bit platforms? And how is this different from the old C++98 Standard?

Quotes from the Standard would be much appreciated.

UPDATE: based on this amazing paper by Hans Boehm on "benign" data races, a simple way to get bitten is for the compiler to cancel both XORs from insert_data() and data_is_present() to alway return true, e.g. if it finds a local code fragment like

insert_data(e, h, t);
if (data_is_present(e, h, t)) // optimized to true as if in single-threaded code
   read_and_process(e, h, t); // data race if other thread has written

回答1:

The C++11 specification defines pretty much any attempt by one thread to read or write a memory location that another thread is writing to as undefined behavior (absent the use of atomics or mutexes to prevent read/writes from one thread while another thread is writing).

Individual compilers may make it safe, but the C++11 specification doesn't provide coverage itself. Simultaneous reads are never a problem; it's writing in one thread while reading/writing in another.

And how is this different from the old C++98 Standard?

The C++98/03 standard doesn't provide any coverage with regard to threading. As far as the C++98/03 memory model is concerned, threading is not a thing that can possibly happen.



回答2:

I dont think it depends so much on the compiler as on the CPU (its instruction set) you are using. I wouldnt think the assumption would be very portable.



回答3:

The code's totally broken. The compiler's has substantial freedom to reorder instructions if its analysis suggests the overall effect is identical. In insert_data for example, there's no guarantee that key_xor_value will be updated before the value, whether the updates are done on temporary registers before being written back into the cache, let alone when those cache updates - whatever their "order" in the machine code language and CPU instruction execution pipeline - will be flushed from the updating core's or cores' (if context-switched mid-function) private caches to become visible to other threads. The compiler might even do the updates in steps using 32 bit registers, depending on the CPU, whether compiling 32-bit or 64-bit, compilation options etc..

Atomic operations tend to require something like CAS (Compare and Swap) style instructions, or volatile and memory barrier instructions, that sync data across cores' caches and enforce some ordering.