C++ map insertion and lookup performance and stora-第2页回答

I would like to store a mapping of an integer key to a float value in-memory.

I have roughly 130 million keys (and, accordingly, 130 million values).

My focus is on lookup performance -- I have to do many, many millions of lookups.

The C++ STL library has a map class for associative arrays of this sort. I have several questions about map.

What is the storage overhead of map for a dataset of the size mentioned above? How does storage overhead scale, in general, with map?

It looks like the underlying data structure for map is a red-black, balanced binary tree. It sounds like the real-world performance for this is O(log n) for insertion and retrieval.

It mentions O(1) for a hinted insertion. My input is pre-sorted, so I believe I should be able to provide a hint for insertion events. How would I provide this hint, using the methods listed here?

Is there an STL container that provides better lookup performance?

Are there other publicly-available, open-source frameworks with an associate array class that uses an underlying data structure that would perform better than STL map?

If writing my own container class would provide better lookup performance, what data structures might I research?

I am using GCC 4 for this task, running under either Linux or Mac OS X.

I apologize in advance if these are dumb questions. Thank you for your advice.

标签： c++ data-structures stl map

8条回答

我欲成王，谁敢阻挡

2楼-- · 2020-05-19 08:39

If your keys are unchanging, you might consider a perfect hash function as an alternative to a standard container.

I don't know what obstacles you'll run into with a dataset of that size, but it might be worth spending a few minutes experimenting.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

3楼-- · 2020-05-19 08:42

You may have a look at std::tr1::unorderd_map.

But if you have 32 bit unsigned integer keys (4294967296 possible values) and 130 million different keys, then you should write your own container optimized for this task. Especially if the 130 million key case is the usual case (and not only a rare maximum).

4294967296 / 130000000 = 33, so about each 33rd number in the whole space is used in your data.

You could for example divide your key range into fixed size partitions. If the keys are rather evenly distributed, you should partition the key space into e.g. 256-sized buckets, or even 32-sized buckets, depends on how much storage you want to waste when only a few values are stored.

Example, to give you an idea:

#define BUCKET_SIZE  256
#define BUCKET_SIZE_SHIFT  8
struct Bucket {
  uint32_t key;
  float value;
  Bucket* pNext;
};

Bucket data[ 4294967296 / BUCKET_SIZE ];

Bucket* find( uint32_t key ) {
  uint32_t bucket_index = key / BUCKET_SIZE;
  // or faster: uint32_t bucket_index = key >> BUCKET_SIZE_SHIFT;
  Bucket* pBucket = &data[ bucket_index ];
  while( pBucket ) {
    if( pBucket->key == key ) return pBucket;
    pBucket = pBucket->pNext;
  }
  return NULL;
}

0人赞添加讨论(0) 举报

上一页 1 2

C++ map insertion and lookup performance and stora

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间