A recent talk about unordered_map
in C++ made me realize, that I should use unordered_map
for most cases where I used map
before, because of the efficiency of lookup ( amortized O(1) vs. O(log n) ). Most times I use a map I use either int
's or std::strings
as keys, hence I've got no problems with the definition of the hash function. The more I thought about it, the more I came to realize that I can't find any reason of using a std::map
in case of simple types over a unordered_map
-- I took a look at the interfaces, and didn't find any significant differences that would impact my code.
Hence the question - is there any real reason to use std::map
over unordered map
in case of simple types like int
and std::string
?
I'm asking from a strictly programming point of view -- I know that it's not fully considered standard, and that it may pose problems with porting.
Also I expect that one of the correct answers might be "it's more efficient for smaller sets of data" because of a smaller overhead (is that true?) -- hence I'd like to restrict the question to cases where the amount of keys is non-trivial (>1 024).
Edit: duh, I forgot the obvious (thanks GMan!) -- yes, map's are ordered of course -- I know that, and am looking for other reasons.
Reasons have been given in other answers; here is another.
std::map (balanced binary tree) operations are amortized O(log n) and worst case O(log n). std::unordered_map (hash table) operations are amortized O(1) and worst case O(n).
How this plays out in practice is that the hash table "hiccups" every once in a while with an O(n) operation, which may or may not be something your application can tolerate. If it can't tolerate it, you'd prefer std::map over std::unordered_map.
I would just point out that... there are many kind of
unordered_map
s.Look up the Wikipedia Article on hash map. Depending on which implementation was used, the characteristics in term of look-up, insertion and deletion might vary quite significantly.
And that's what worries me the most with the addition of
unordered_map
to the STL: they will have to choose a particular implementation as I doubt they'll go down thePolicy
road, and so we will be stuck with an implementation for the average use and nothing for the other cases...For example some hash maps have linear rehashing, where instead of rehashing the whole hash map at once, a portion is rehash at each insertion, which helps amortizing the cost.
Another example: some hash maps use a simple list of nodes for a bucket, others use a map, others don't use nodes but find the nearest slot and lastly some will use a list of nodes but reorder it so that the last accessed element is at the front (like a caching thing).
So at the moment I tend to prefer the
std::map
or perhaps aloki::AssocVector
(for frozen data sets).Don't get me wrong, I'd like to use the
std::unordered_map
and I may in the future, but it's difficult to "trust" the portability of such a container when you think of all the ways of implementing it and the various performances that result of this.Small addition to all of above:
Better use
map
, when you need to get elements by range, as they are sorted and you can just iterate over them from one boundary to another.I'd echo roughly the same point GMan made: depending on the type of use,
std::map
can be (and often is) faster thanstd::tr1::unordered_map
(using the implementation included in VS 2008 SP1).There are a few complicating factors to keep in mind. For example, in
std::map
, you're comparing keys, which means you only ever look at enough of the beginning of a key to distinguish between the right and left sub-branches of the tree. In my experience, nearly the only time you look at an entire key is if you're using something like int that you can compare in a single instruction. With a more typical key type like std::string, you often compare only a few characters or so.A decent hash function, by contrast, always looks at the entire key. IOW, even if the table lookup is constant complexity, the hash itself has roughly linear complexity (though on the length of the key, not the number of items). With long strings as keys, an
std::map
might finish a search before anunordered_map
would even start its search.Second, while there are several methods of resizing hash tables, most of them are pretty slow -- to the point that unless lookups are considerably more frequent than insertions and deletions, std::map will often be faster than
std::unordered_map
.Of course, as I mentioned in the comment on your previous question, you can also use a table of trees. This has both advantages and disadvantages. On one hand, it limits the worst case to that of a tree. It also allows fast insertion and deletion, because (at least when I've done it) I've used a fixed-size of table. Eliminating all table resizing allows you to keep your hash table a lot simpler and typically faster.
One other point: the requirements for hashing and tree-based maps are different. Hashing obviously requires a hash function, and an equality comparison, where ordered maps require a less-than comparison. Of course the hybrid I mentioned requires both. Of course, for the common case of using a string as the key, this isn't really a problem, but some types of keys suit ordering better than hashing (or vice versa).
From: http://www.cplusplus.com/reference/map/map/
"Internally, the elements in a map are always sorted by its key following a specific strict weak ordering criterion indicated by its internal comparison object (of type Compare).
map containers are generally slower than unordered_map containers to access individual elements by their key, but they allow the direct iteration on subsets based on their order."
If you want to compare the speed of your std::map and std::unordered_map implementations, you could use Google's sparsehash project which has a time_hash_map program to time them. For example, with gcc 4.4.2 on an x86_64 Linux system