Pre-allocating buckets in a C++ std::unordered_map

I'm using the std::unordered_map from gnu++0x to store a huge amount of data. I want to pre-allocate space for the large number of elements, since I can bound the total space used.

What I would like to be able to do is call:

std::unordered_map m;
m.resize(pow(2,x));

where x is known.

std::unordered_map doesn't support this. I would rather use std::unordered_map if possible, since it will eventually be part of the standard.

Some other constraints:

Need reliable O(1) access and mutation of the map. The desired hash and comparison functions are already non-standard and somewhat expensive. O(log n) mutation (as with std::map) is too expensive.

-> The expensive hash and comparison also make amortization-based growth way too expensive. Each extra insert requires O(n) operations from those functions, which results in an extra quadratic term in the algorithm's run time, since the exponential storage requirements need O(n) growths.

标签： c++ unordered-map buckets

5条回答

ゆ、 Hurt°

2楼-- · 2020-05-30 04:13

I think rehash and reserve both work only if you know in advance how much memory your mapped value will take. If the mapped value is complicated or dynamically changes in size (e.g. a vector), you will need your own implementation. For example, if your memory size allows, you may reserve the biggest container that may ever happen to exist.

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2020-05-30 04:16

m.rehash(pow(2,x));

if pow(2, x) is the number of buckets you want preallocated. You can also:

m.reserve(pow(2,x));

but now pow(2, x) is the number of elements you are planning on inserting. Both functions do nothing but preallocate buckets. They don't insert any elements. And they are both meant to be used exactly for your use case.

Note: You aren't guaranteed to get exactly pow(2, x) buckets. Some implementations will use only a number of buckets which is a power of 2. Other implementations will use only a prime number of buckets. Still others will use only a subset of primes for the number of buckets. But in any case, the implementation should accept your hint at the number of buckets you desire, and then internally round up to its next acceptable number of buckets.

Here is the precise wording that the latest (N4660) uses to specify the argument to rehash:

a.rehash(n) : Postconditions: a.bucket_count() >= a.size() / a.max_load_factor() and a.bucket_count() >= n.

This postcondition ensures that bucket()_count() >= n, and that load_factor() remains less than or equal to max_load_factor().

Subsequently reserve(n) is defined in terms of rehash(n):

a.reserve(n) : Same as a.rehash(ceil(n / a.max_load_factor())).

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2020-05-30 04:16

I would suggest writing your own allocator for the std::unordered_map that allocates memory exactly in the way you want.

0人赞添加讨论(0) 举报

狗以群分

5楼-- · 2020-05-30 04:20

The constructor takes a parameter "size_type bucket_count" according to http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map

so the simplest way to do what your example code says is:

std::unordered_map m{ pow(2,x) };

This will be more efficient since it's undefined how many buckets will be reserved on construction otherwise, it may have to allocate and then deallocate when you call reserve afterwards.

0人赞添加讨论(0) 举报

来，给爷笑一个

6楼-- · 2020-05-30 04:29

I don't think it matters for an unordered map to have pre-allocated memory. The STL is expected to be O(n) amortized insertion time. Save yourself the hassle of writing your own allocator until you know this is the bottle neck of your code, in my opinion.

0人赞添加讨论(0) 举报

Pre-allocating buckets in a C++ std::unordered_map

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间