Check if value exists across 16 containers

2020-04-11 07:40发布

问题:

I have 16 threads that calculate the hash of a key. I'm trying to divide up the work between the threads, because calculating the hash and checking if it exists in a linear fashion is only utilizing a fraction of my cpu power. Currently, I am using a single map container that all threads can access using mutex locking. However, since the actual hashing takes next to no time at all, the threads are mostly sitting idle, waiting on another thread to finish its business using map::count to check if the key exists in the map.

The main goal of this program is brute force checking for collisions, as I need to be sure there are none before I add it to my project.

Is there a way to use separate maps, or other containers, and determine if said key exists, rather than linearly searching through each map with each key once all the threads are finished? What about some sort of queuing system?

Edit: This is the function I'm trying to thread:

int coll = 0;
map<long, bool> mymap;
string temp;
long myhash;
for (int i = 0; i < 256; i++)
  for (int j = 0; j < 256; j++)
    for (int k = 0; k < 256; k++)
    {
      temp = i;
      temp += j;
      temp += k;
      temp += temp;
      myhash = hash(temp.c_str());

      if (mymap.count(myhash))
      {
        coll++;
        cout << "Collision at " << i << " " << j << " " << k << endl;
      }
      else
      {
        mymap[myhash] = true;
      }
  }

cout << "Number of collisions: " << coll << endl;
cout << "Map size: " << mymap.size() << endl;

回答1:

This algorithm seems fairly easy to parallelize with OpenMP:

int coll = 0;
map<long, bool> mymap;

#pragma omp parallel for
for (int i = 0; i < 256; i++)
  for (int j = 0; j < 256; j++)
    for (int k = 0; k < 256; k++)
    {
      string temp = i;
      temp += j;
      temp += k;
      temp += temp;
      long myhash = hash(temp.c_str());

      if (mymap.count(myhash))
      {
        #pragma omp atomic
        coll++;
        cout << "Collision at " << i << " " << j << " " << k << endl;
      }
      else
      {
        #pragma omp critical
        mymap[myhash] = true;
      }
  }

Some explanation: first we start from the assumption that collisions are very rare (it would be a very poor hash table implementation if collisions were frequent). Given this, it's very unlikely that, as a thread is inserting to a certain key, another thread simultaneously inserts the exact same key because it happened to stumble upon a different value that hashes to the exact same key. Furthermore, even if this were the case, it is sufficient for only one of them to set the value true, since it cannot go back to false and subsequent "insertions" will only overwrite a true with true. Therefore, in my opinion, besides the increment of coll no further synchronization is needed.



回答2:

Although this has already be answered above, you can improve performance by replacing the std::map::count() and insert via array operator with something more effecient

One of the std::map::insert() methods returns a pair where the bool member will be false if the element already existed in the map. Something like this:

    int coll = 0;
typedef map<long, bool> MY_MAP_TYPE;
MY_MAP_TYPE mymap;
string temp;
long myhash;
for (int i = 0; i < 256; i++)
    for (int j = 0; j < 256; j++)
        for (int k = 0; k < 256; k++)
        {
            temp = i;
            temp += j;
            temp += k;
            temp += temp;
            myhash = hash(temp.c_str());
            if( mymap.insert( MY_MAP_TYPE::value_type( myhash, true ) ).second == false)
            {
                coll++;
                cout << "Collision at " << i << " " << j << " " << k << endl;
            }
        }