Is the unordered_map really unordered?

2020-02-01 01:31发布

I am very confused by the name 'unordered_map'. The name suggests that the keys are not ordered at all. But I always thought they are ordered by their hash value. Or is that wrong (because the name implies that they are not ordered)?

Or to put it different: Is this

typedef map<K, V, HashComp<K> > HashMap;

with

template<typename T>
struct HashComp {
    bool operator<(const T& v1, const T& v2) const {
        return hash<T>()(v1) < hash<T>()(v2);
    }
};

the same as

typedef unordered_map<K, V> HashMap;

? (OK, not exactly, STL will complain here because there may be keys k1,k2 and neither k1 < k2 nor k2 < k1. You would need to use multimap and overwrite the equal-check.)

Or again differently: When I iterate through them, can I assume that the key-list is ordered by their hash value?

5条回答
Emotional °昔
2楼-- · 2020-02-01 01:34

In answer to your edited question, no those two snippets are not equivalent at all. std::map stores nodes in a tree structure, unordered_map stores them in a hashtable*.

Keys are not stored in order of their "hash value" because they're not stored in any order at all. They are instead stored in "buckets" where each bucket corresponds to a range of hash values. Basically, the implementation goes like this:

function add_value(object key, object value) {
   int hash = key.getHash();

   int bucket_index = hash % NUM_BUCKETS;
   if (buckets[bucket_index] == null) {
       buckets[bucket_index] = new linked_list();
   }
   buckets[bucket_index].add(new key_value(key, value));
}

function get_value(object key) {
   int hash = key.getHash();

   int bucket_index = hash % NUM_BUCKETS;
   if (buckets[bucket_index] == null) {
       return null;
   }

   foreach(key_value kv in buckets[bucket_index]) {
       if (kv.key == key) {
           return kv.value;
       }
   }
}

Obviously that's a serious simplification and real implementation would be much more advanced (for example, supporting resizing the buckets array, maybe using a tree structure instead of linked list for the buckets, and so on), but that should give an idea of how you can't get back the values in any particular order. See wikipedia for more information.


* Technically, the internal implementation of std::map and unordered_map are implementation-defined, but the standard requires certain Big-O complexity for operations that implies those internal implementations

查看更多
倾城 Initia
3楼-- · 2020-02-01 01:39

As the name unordered_map suggests, no ordering is specified by the C++0x standard. An unordered_map's apparent ordering will be dependent on whatever is convenient for the actual implementation.

查看更多
对你真心纯属浪费
4楼-- · 2020-02-01 01:48

You are right, unordered_map is actually hash ordered. Note that most current implementations (pre TR1) call it hash_map.

The IBM C/C++ compiler documentation remarks that if you have an optimal hash function, the number of operations performed during lookup, insertion, and removal of an arbitrary element does not depend on the number of elements in the sequence, so this mean that the order is not so unordered...

Now, what does it mean that it is hash ordered? As an hash should be unpredictable, by definition you can't take any assumption about the order of the elements in the map. This is the reason why it has been renamed in TR1: the old name suggested an order. Now we know that an order is actually used, but you can disregard it as it is unpredictable.

查看更多
该账号已被封号
5楼-- · 2020-02-01 01:49

If you want an analogy, look at the RDBMS of your choice.

If you don't specify an ORDER BY clause when performing a query, the results are returned "unordered" - that is, in whatever order the database feels like. The order is not specified, and the system is free to "order" them however it likes in order to get the best performance.

查看更多
家丑人穷心不美
6楼-- · 2020-02-01 01:59

"Unordered" doesn't mean that there isn't a linear sequence somewhere in the implementation. It means "you can't assume anything about the order of these elements".

For example, people often assume that entries will come out of a hash map in the same order they were put in. But they don't, because the entries are unordered.

As for "ordered by their hash value": hash values are generally taken from the full range of integers, but hash maps don't have 2**32 slots in them. The hash value's range will be reduced to the number of slots by taking it modulo the number of slots. Further, as you add entries to a hash map, it might change size to accommodate the new values. This can cause all the previous entries to be re-placed, changing their order.

In an unordered data structure, you can't assume anything about the order of the entries.

查看更多
登录 后发表回答