tbb:concurrent_hash_map: sample code for Intel

2020-08-01 06:07发布

问题:

Looking for sample code to use tbb::concurrent_hash_map<K,V> from Intel Threading Building Blocks (TBB).

I can insert, but I cannot seem to read the values back.

The official Intel documentation appears to be somewhat lacking on the sample code side.

Update

The best docs are in "Pro TBB: C++ Parallel Programming with Threading Building Blocks" by Voss. Download this book for free (it's public domain).

Ignore the Intel docs. They are essentially a collection of function signatures.

回答1:

Intel TBB is open source, and on GitHub:

https://github.com/intel/tbb

To install TBB, I used vcpkg which is compatible with Linux, Windows and Mac. Yes, vcpkg is from Microsoft, but it is 100% cross-platform, open source, and very popular.

Linux:

./vcpkg search tbb              # Find the package.
./vcpkg install tbb:x64-linux   # Install the package.

Windows:

vcpkg search tbb                # Find the package.
vcpkg install tbb:x64-windows   # Install the package.

Compile:

  • Compatible with any modern compiler including MSVC, GCC, LLVM, Intel Compiler (ICC), etc. I used CMake for gcc.

Can also download the source and extract the headers and libraries into the source tree, this works just as well.

Code.

#include "tbb/concurrent_hash_map.h" // For concurrent hash map.

tbb::concurrent_hash_map<int, string> dict;
typedef tbb::concurrent_hash_map<int, string>::accessor dictAccessor; // See notes on accessor below.   

print("  - Insert key, method 1:\n");   
dict.insert({1,"k1"});
print("    - 1: k1\n");

print("  - Insert key, method 2:\n");
dict.emplace(2,"k2");
print("    - 2: k2\n");

string result;

{
    print("  - Read an existing key:\n");   
    dictAccessor accessor;
    const auto isFound = dict.find(accessor, 2);
    // The accessor functions as:
    // (a) a fine-grained per-key lock (released when it goes out of scope).
    // (b) a method to read the value.
    // (c) a method to insert or update the value.
    if (isFound == true) {
        print("    - {}: {}\n", accessor->first, accessor->second);
    }
}

{
    print("  - Atomically insert or update a key:\n");  
    dictAccessor accessor;
    const auto itemIsNew = dict.insert(accessor, 4);
    // The accessor functions as:
    // (a) a fine-grained per-key lock (released when it goes out of scope).
    // (b) a method to read the value.
    // (c) a method to insert or update the value.
    if (itemIsNew == true) {
        print("    - Insert.\n");
        accessor->second = "k4";
    }
    else {
        print("    - Update.\n");
        accessor->second = accessor->second + "+update";
    }
    print("    - {}: {}\n", accessor->first, accessor->second);     
}

{
    print("  - Atomically insert or update a key:\n");          
    dictAccessor accessor;
    const auto itemIsNew = dict.insert(accessor, 4);
    // The accessor functions as:
    // (a) a fine-grained per-key lock which is released when it goes out of scope.
    // (b) a method to read the value.
    // (c) a method to insert or update the value.
    if (itemIsNew == true) {
        print("    - Insert.\n");
        accessor->second = "k4";
    }
    else {
        print("    - Update.\n");
        accessor->second = accessor->second + "+update";
    }
    print("    - {}: {}\n", accessor->first, accessor->second);     
}

{
    print("  - Read the final state of the key:\n");            
    dictAccessor accessor;
    const auto isFound = dict.find(accessor, 4);
    print("    - {}: {}\n", accessor->first, accessor->second);
}

Printing uses {fmtlib} for printing; can replace with cout <<.

Output:

- Insert key, method 1:
  - 1: k1
- Insert key, method 2:
  - 2: k2
- Read an existing key:
  - 2: k2
- Atomically insert or update a key:
  - Insert.
  - 4: k4
- Atomically insert or update a key:
  - Update.
  - 4: k4+update
- Read the final state of the key:
  - 4: k4+update

Other hash maps

  • See: https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html
  • See: std::unordered_map. This has a more standard API, and is thread safe in many situations, see: unordered_map thread safety. Suggest using this, if possible, as it has a simpler API.
  • There is also the concurrent_unordered_map from Intel TBB. It is essentially the same thing, a key/value map. However, it is much older, much much lower level, and more difficult to use. One has to supply a hasher, a equality operator, and an allocator. There is no sample code anywhere, even in the official Intel docs. I never got it working, despite months of occasional attempts. It may be obsolete, as it is not mentioned in said free book (it only covers concurrent_hash_map). Not recommended.

Update: Reader/Writer Locks

There are actually two accessors, one is a read lock, one is a write lock:

  • const_accessor
  • accessor

If using find, use const_accessor which is a read lock. If using insert or erase, use accessor which is a write lock (i.e. it will wait until any reads are done, and block further reads until it is done).

This is effectively equivalent to a reader/writer lock, but on a single dictionary key in the dictonary, rather than the entire dictionary.

Update

Final part of the learning curve: for key writes, nothing happens until the accessor goes out of scope. So any locks are held for no more than a few machine instructions, probably using CAS (Compare And Swap).

Comparing this to a database, the scope of the accessor is like a transaction. When the accessor goes out of scope, the entire transaction is committed to the hashmap.

Update

The free book mentioned above has fantastic performance tips in the chapter on concurrent_hash_map.

Conclusion

The API for this hash map is powerful but somewhat awkward. However, it supports fine-grained, per-key locks on insert/update. Any locks are only held for a handful of machine instructions, using CAS. This is something that few other hashmaps can offer, in any language. Recommend starting with std::unordered_map for simplicity; it is thread safe as long as the two threads do not write to the same key. If blazingly fast performance is required, there is an option to either refactor, or write a compatible wrapper on top with [] accessors and insert_or_update().