C++ unordered_map with char* as key

2019-01-24 11:11发布

问题:

I feel exhausted when trying to use the container unordered_map with char* as the key (on Windows, I am using VS 2010). I know that I have to define my own compare function for char*, which inherits from binary_function. The following is a sample program.

#include<unordered_map>
#include <iostream>
#include <string>
using namespace std;

template <class _Tp>  
struct my_equal_to : public binary_function<_Tp, _Tp, bool>  
{  
    bool operator()(const _Tp& __x, const _Tp& __y) const  
    { return strcmp( __x, __y ) == 0; }  
};

typedef unordered_map<char*, unsigned int, ::std::tr1::hash<char*>,  my_equal_to<char*> > my_unordered_map;
//typedef unordered_map<string, unsigned int > my_unordered_map;

my_unordered_map location_map;

int main(){
    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));
    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    printf("map size: %d\n", location_map.size());
    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end())
    {
        printf("found!\n");
    }

    return 0;
} 

I insert the same C string abc twice and look it up. The second insertion should fail and there will be only one abc in the unordered_map. However, the output size is 3. It seems that the compare function does not work properly here.

Moreover, I get another strange result about the find function, by running the program for many times, the finding result even changes! Sometimes the string abc is found, while the other times abc is not found!

Could anyone help me on this? Your help is very much appreciated!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Edit: After defining a hash function for char* by my own, the program works properly. The full program code is listed below. Thank you all.

#include<unordered_map>
#include <iostream>
using namespace std;

template <class _Tp>  
struct my_equal_to : public binary_function<_Tp, _Tp, bool>  
{  
    bool operator()(const _Tp& __x, const _Tp& __y) const  
    { return strcmp( __x, __y ) == 0; }  
};


struct Hash_Func{
    //BKDR hash algorithm
    int operator()(char * str)const
    {
        int seed = 131;//31  131 1313 13131131313 etc//
        int hash = 0;
        while(*str)
        {
            hash = (hash * seed) + (*str);
            str ++;
        }

        return hash & (0x7FFFFFFF);
    }
};

typedef unordered_map<char*, unsigned int, Hash_Func,  my_equal_to<char*> > my_unordered_map;


int main(){
    my_unordered_map location_map;

    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));
    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    printf("map size: %d\n", location_map.size());
    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end())
    {
        printf("found!\n");
    }

    return 0;
}

Note: Using char* as the key type for an unordered_map or other STL containers may be dangerous, a safe way (seems to be the only way) is: in the main function, new or malloc a block (e.g. an array of c strings) on heap and fill it with c strings. Insert these c strings into unordered_map. The allocated block of memory is freed at the end of of main function (by delete or free).

回答1:

You comparator is fine (although passing a nullptr is undefined and probably should be handled)

The hash, ::std::tr1::hash<char*> is hashing off pointers so each "abc" goes (usually) in a different bucket

You need to write your own hash function that guarantees that hash("abc") always gives the same answer

For now - performance will be terrible, but have a hash that returns 0 - and you should see the second "abc" match the first

As per comments - using std::string simplifies memory management and provides a library supported hash and comparator, so just std::unordered_map<std::string, X> will work. This also means that upon deletion of the unordered map all strings will be deallocated for you. You can even instantiate the std::strings from char arrays on the stack safely.

If you still want to use char * then you will still need your own comparator and hash, but you can use std::shared_ptr to manage the memory for you (do not use stack instances - do a new char[]) you will then have a std::unordered_map<shared_ptr<char *>, X> but have no complications later from memory leaks.

If you still want to use char * you are on the right track, but it is important that you use a memory leak tool like purify or valgrind to make sure that you truly have all the memory management under control. (This is generally a good idea for any project)

Finally, global variables should be avoided.



回答2:

Using a char pointer as a key like you are above is almost certainly not what you want to do.

STL containers deal with stored values, in the case of std::unordered_map<char *, unsigned int, ...>, you are dealing with pointers to c strings, which may not even be around on subsequent insertion/removal checks.

Note that your my_unordered_map is a global variable but you are trying to insert local char arrays a, b, and c. What do you expect your comparison function my_equal_to() to strcmp() when the inserted c strings fall out of scope? (You suddenly have keys pointing to random garbage that can be compared to newly inserted future values.)

It is important that STL map keys be copyable values that cannot have their meanings changed by external program behavior. You should almost certainly use std::string or similar for your key values, even if their construction seems wasteful to you at first glance.

The following will work exactly as you intend things to work above, and is vastly safer:

#include <unordered_map>
#include <iostream>
#include <string>

using namespace std;

// STL containers use copy semantics, so don't use pointers for keys!!
typedef unordered_map<std::string, unsigned int> my_unordered_map;

my_unordered_map location_map;

int main() {
    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));

    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    cout << "map size: " << location_map.size() << endl;

    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end()) {
        cout << "found \"" << it->first << "\": " << it->second << endl;
    }

    return 0;
}


回答3:

When you define something such as "abc" it get assigned a const char*. Every time that you write "abc" within your program there is going to be a new memory alocated. So:

const char* x = "abc";
const char* y = "abc";
return x==y;

Will always return false because new memory is alocated each time "abc" is wrriten (sorry if I sound a bit repetitive).



标签: c++ map