Is there any advantage of using map over unordered

2019-01-01 16:40发布

A recent talk about unordered_map in C++ made me realize, that I should use unordered_map for most cases where I used map before, because of the efficiency of lookup ( amortized O(1) vs. O(log n) ). Most times I use a map I use either int's or std::strings as keys, hence I've got no problems with the definition of the hash function. The more I thought about it, the more I came to realize that I can't find any reason of using a std::map in case of simple types over a unordered_map -- I took a look at the interfaces, and didn't find any significant differences that would impact my code.

Hence the question - is there any real reason to use std::map over unordered map in case of simple types like int and std::string?

I'm asking from a strictly programming point of view -- I know that it's not fully considered standard, and that it may pose problems with porting.

Also I expect that one of the correct answers might be "it's more efficient for smaller sets of data" because of a smaller overhead (is that true?) -- hence I'd like to restrict the question to cases where the amount of keys is non-trivial (>1 024).

Edit: duh, I forgot the obvious (thanks GMan!) -- yes, map's are ordered of course -- I know that, and am looking for other reasons.

12条回答
伤终究还是伤i
2楼-- · 2019-01-01 16:55

I was intrigued by the answer from @Jerry Coffin, which suggested that the ordered map would exhibit performance increases on long strings, after some experimentation (which can be downloaded from pastebin), I've found that this only seems to hold true for collections of random strings, when the map is initialised with a sorted dictionary (which contain words with considerable amounts of prefix-overlap), this rule breaks down, presumably because of the increased tree depth necessary to retrieve value. The results are shown below, the 1st number column is insert time, 2nd is fetch time.

g++ -g -O3 --std=c++0x   -c -o stdtests.o stdtests.cpp
g++ -o stdtests stdtests.o
gmurphy@interloper:HashTests$ ./stdtests
# 1st number column is insert time, 2nd is fetch time
 ** Integer Keys ** 
 unordered:      137      15
   ordered:      168      81
 ** Random String Keys ** 
 unordered:       55      50
   ordered:       33      31
 ** Real Words Keys ** 
 unordered:      278      76
   ordered:      516     298
查看更多
若你有天会懂
3楼-- · 2019-01-01 17:00

Hash tables have higher constants than common map implementations, which become significant for small containers. Max size is 10, 100, or maybe even 1,000 or more? Constants are the same as ever, but O(log n) is close to O(k). (Remember logarithmic complexity is still really good.)

What makes a good hash function depends on your data's characteristics; so if I don't plan on looking at a custom hash function (but can certainly change my mind later, and easily since I typedef damn near everything) and even though defaults are chosen to perform decently for many data sources, I find the ordered nature of map to be enough of a help initially that I still default to map rather than a hash table in that case.

Plus that way you don't have to even think about writing a hash function for other (usually UDT) types, and just write op< (which you want anyway).

查看更多
闭嘴吧你
4楼-- · 2019-01-01 17:01

In most languages, unordered map (aka hash based dictionaries) are the default map however in C++ you get ordered map as default map. How did that happen? Some people erroneously assume that C++ committee made this decision in their unique wisdom but the truth is unfortunately uglier than that.

It is widely believed that C++ ended up with ordered map as default because there are not too many parameters on how they can be implemented. On the other hand, hash based implementations has tons of things to talk about. So to avoid gridlocks in standardization they just got along with ordered map. Around 2005, many languages already had good implementations of hash based implementation and so it was more easier for the committee to accept new std::unordered_map. In a perfect world, std::map would have been unordered and we would have std::ordered_map as separate type.

Below two graphs should speak for themselves (source):

enter image description here

enter image description here

Summary

Assuming ordering is not important:

  • If you are going to build large table once and do lots of queries, use std::unordered_map
  • If you are going to build small table (may be under 100 elements) and do lots of queries, use std::map. This is because reads on it are O(log n).
  • If you are going to change table a lot then may be std::map is good option.
  • If you are in doubt, just use std::unordered_map.
查看更多
低头抚发
5楼-- · 2019-01-01 17:04

Significant differences that have not really been adequately mentioned here:

  • map keeps iterators to all elements stable, in C++17 you can even move elements from one map to the other without invalidating invalidating iterators to them (and if properly implemented without any potential allocation).
  • map timings for single operations are typically more consistent, since they never need large allocations.
  • unordered_map using std::hash as implemented in the libstdc++ is vulnerable to DoS if fed with untrusted input (it uses MurmurHash2 with a constant seed - not that seeding would really help, see https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/).
  • Being ordered enables efficient range searches, e.g. iterate over all elements with key >= 42.
查看更多
深知你不懂我心
6楼-- · 2019-01-01 17:05

Don't forget the map's keep their elements ordered. If you can't give up that, obviously you can't use an unordered_map.

Something else to keep in mind is that unordered_map's generally use more memory. A map just has a few house-keeping pointers then memory for each object. Contrarily, unordered_map's have a big array (these can get quite big in some implementations) and then additional memory for each object. If you need to be memory-aware, a map should prove better, because it lacks the large array.

So, if you need pure lookup-retrieval, I'd say an unordered_map is the way to go. But there are always trade-offs, and if you can't afford them, then you can't use it.

Just from personal experience, I found an enormous improvement in performance (measured, of course) when using an unordered_map instead of a map in a main entity look-up table.

On the other hand, I found it was much slower at repeatedly inserting and removing elements. It's great for a relatively static collection of elements, but if you're doing tons of insertions and deletions the hashing + bucketing seems to add up. (Note, this was over many iterations.)

查看更多
几人难应
7楼-- · 2019-01-01 17:10

I've made a test recently which makes 50000 merge&sort. That means if the string keys are the same, merge the byte string. And the final output should be sorted. So this includes a look up for every insertion.

For the map implementation, it takes 200 ms to finish the job. For the unordered_map + map, it takes 70 ms for unordered_map insertion and 80 ms for map insertion. So the hybrid implementation is 50 ms faster.

We should think twice before we use the map. If you only need the data to be sorted in the final result of your program, a hybrid solution may be better.

查看更多
登录 后发表回答