how to use boost::unordered_map

2019-06-08 07:56发布

问题:

for my application, i need to use a hash map, so i have written a test program in which i store some instances of a baseclass in a boost::unordered_map. but i want to reach the instances by calling special functions which return a derived class of the base and i use those functions' parameters for hash key of unordered_map. if no class is found with certain parameters then a class is generated and stored in map. the purpose of the program may not be clear but here is the code.

#include <boost/unordered_map.hpp>
#include <iostream>

using namespace std;
using namespace boost;
typedef unsigned char BYT;
typedef unsigned long long ULL;

class BaseClass
{
public:
    int sign;
    size_t HASHCODE;
    BaseClass(){}
};

class  ClassA : public BaseClass
{
public:
    int AParam1;
    int AParam2;
    ClassA(int s1, int s2) : AParam1(s1), AParam2(s2)
    {
        sign = AParam1;
    }
};


struct HashKey
{
    ULL * hasharray;
    size_t hashNum;
    size_t HASHCODE;
    HashKey(ULL * ULLarray,  size_t Hashnum) : hasharray(ULLarray), hashNum(Hashnum), HASHCODE(0)
    {   }
    bool operator == (const HashKey & hk ) const
    {
        bool deg = (hashNum == hk.hashNum);
        if (deg)
        {
            for (int i = 0; i< hashNum;i++)
                if(hasharray[i] != hk.hasharray[i]) return false;
        }
        return deg;
    }
};

struct ihash : std::unary_function<HashKey, std::size_t>
{
    std::size_t operator()(HashKey const & x) const
    {
        std::size_t seed = 0;
        if (x.hashNum == 1)
            seed = x.hasharray[0];
        else
        {
            int amount = x.hashNum * 8;
            const std::size_t fnv_prime = 16777619u;
            BYT * byt = (BYT*)x.hasharray;
            for (int i = 0; i< amount;i++)
            {
                seed ^= byt[0];
                seed *= fnv_prime;
            }
        }
        return seed;
    }
};

typedef std::pair<HashKey,BaseClass*> HashPair;
unordered_map<HashKey,BaseClass*,ihash> UMAP;
typedef unordered_map<HashKey,BaseClass*,ihash>::iterator iter;


BaseClass * & FindClass(ULL* byt, int Num, size_t & HCode)
{
    HashKey hk(byt,Num); 
    HashPair hp(hk,0);
    std::pair<iter,bool> xx = UMAP.insert(hp);
//  if (xx.second) UMAP.rehash((UMAP.size() + 1) / UMAP.max_load_factor() + 1);
    if (!xx.first->second) HCode = UMAP.hash_function()(hk);
    return xx.first->second;
}


template <typename T, class A,class B> 
T* GetClass(size_t& hashcode ,A a, B b)
{   
    ULL byt[3] = {a,b,hashcode};
    BaseClass *& cls = FindClass(byt, 3, hashcode);
    if(! cls){ cls = new T(a,b); cls->HASHCODE = hashcode;}
    return static_cast<T*>(cls);
}



ClassA * findA(int Period1, int Period2)
{
    size_t classID = 100;
    return GetClass<ClassA>(classID,Period1,Period2);
}

int main(int argc, char* argv[])
{
    int limit = 1000;
     int modnum = 40;
    int result = 0;

    for(int i = 0 ; i < limit; i++ )
    {
        result += findA( rand() % modnum ,4)->sign ;
    }

    cout << UMAP.size() << "," << UMAP.bucket_count() << "," << result <<  endl;

    int x = 0;

    for(iter it =  UMAP.begin(); it != UMAP.end(); it++)
    {
        cout << ++x << "," << it->second->HASHCODE << "," << it->second->sign << endl ;
        delete it->second;

    }

    return 0;
}

the problem is, i expect that the size of UMAP is equal to modnum however it is allways greater than modnum which means there are more than one instance that has the same parameters and HASHCODE.

what is the solution to my problem? please help.
thanks

回答1:

Here are a couple of design problems:

struct HashKey
{
    ULL * hasharray;
    ...

Your key type stores a pointer to some array. But this pointer is initialized with the address of a local object:

BaseClass * & FindClass(ULL* byt, int Num, size_t & HCode)
{
    HashKey hk(byt,Num); // <-- !!!
    HashPair hp(hk,0);
    std::pair<iter,bool> xx = UMAP.insert(hp);
    if (!xx.first->second) HCode = UMAP.hash_function()(hk);
    return xx.first->second;
}

template <typename T, class A,class B> 
T* GetClass(size_t& hashcode ,A a, B b)
{   
    ULL byt[3] = {a,b,hashcode}; // <-- !!!
    BaseClass *& cls = FindClass(byt, 3, hashcode);
    if(! cls){ cls = new T(a,b); cls->HASHCODE = hashcode;}
    return static_cast<T*>(cls);
}

This makes the map store a HashKey object with a dangling pointer. Also you are returning a reference to a member of a function local object called xx in FindClass. The use of this reference invokes undefined behaviour.

Consider renaming the map's key type. The hash code itself shouldn't be a key. And as your operator== for HashKey suggests, you don't want the actual key to be the hash code but the sequence of integers of variable length. Also, consider storing the sequence inside of the key type instead of a pointer, for example, as a vector. In addition, avoid returning references to function local objects.



回答2:

Using unordered_map does not guarantee that you do not get has collisions, which is what you describe here.

there are more than one instance that has the same parameters and HASHCODE

You can tune your hashing algorithm to minimize this, but in the (inevitable) collision case, the hash container extends the list of objects in the bucket corresponding to that hashcode. Equality comparison is then used to resolve the collision to a specific matching object. This may be where your problem lies - perhaps your operator== does not properly disambiguate similar but not identical objects.

You cannot expect one object per bucket, or the container would grow unbounded in large collection size cases.

btw if you are using a newer compiler you may find it supports std::unordered_map, so you can use that (the official STL version) instead of the Boost version.