C++ ~ 1M look-ups in unordered_map with string key

I have .NET and C++ implementations of a perf test function that does 854,750 lookups in a dictionary using string keys from a pool of 6838 keys. I wrote these functions to investigate a perf bottleneck in a real app.

.NET implementation is written in F#, uses Dictionary and is compiled for .NET 4.0

C++ implementation uses std::unordered_map and is built with VS2010 in Release mode.

On my machine .NET code runs in 240 ms on average and C++ code runs in 630 ms. Could you please help me to understand what can be the reason for this huge difference in speed?

If I make key length in C++ implementation shorter and use "key_" prefix instead of "key_prefix_" it will run in 140 ms.

Another trick I tried is to replace std::string with a custom immutable string implementation that has a const char* pointer to the source and a one-time computed hash. Using this string allowed to get performance of C++ implementation down to 190 ms.

C++ code:

struct SomeData
{
public:
    float Value;
};

typedef std::string KeyString;
typedef std::unordered_map<KeyString, SomeData> DictionaryT;

const int MaxNumberOfRuns = 125;
const int MaxNumberOfKeys = 6838;

DictionaryT dictionary;
dictionary.rehash(MaxNumberOfKeys);

auto timer = Stopwatch::StartNew();

int lookupCount = 0;

char keyBuffer[100] = "key_prefix_";
size_t keyPrefixLen = std::strlen(keyBuffer);

/// run MaxNumberOfRuns * MaxNumberOfKeys iterations
for(int runId = 0; runId < MaxNumberOfRuns; runId++)
{
    for(int keyId = 0; keyId < MaxNumberOfKeys; keyId++)
    {
        /// get a new key from the pool of MaxNumberOfKeys keys           
        int randomKeySuffix = (std::rand() % MaxNumberOfKeys);
        ::itoa(randomKeySuffix, keyBuffer + keyPrefixLen, 10);

        KeyString key = keyBuffer;

        /// lookup key in the dictionary         
        auto dataIter = dictionary.find(key);
        SomeData* data;

        if(dataIter != dictionary.end())
        {
            /// get existing value           
            data = &dataIter->second;
        }
        else
        {
            /// add a new value
            data = &dictionary.insert(dataIter, DictionaryT::value_type(key, SomeData()))->second;
        }

        /// update corresponding value in the dictionary
        data->Value += keyId * runId;
        lookupCount++;
    }
}

timer.Stop();
std::cout << "Time: " << timer.GetElapsedMilleseconds() << " ms" << std::endl;
std::cout << "Lookup count: " << lookupCount << std::endl;

Prints:

Time: 636 ms
Lookup count: 854750

F# code

open System
open System.Diagnostics
open System.Collections.Generic

type SomeData =
    struct
        val mutable Value : float
    end

let dictionary = new Dictionary<string, SomeData>()
let randomGen = new Random()

let MaxNumberOfRuns = 125
let MaxNumberOfKeys = 6838

let timer = Stopwatch.StartNew()

let mutable lookupCount = 0

/// run MaxNumberOfRuns * MaxNumberOfKeys iterations
for runId in 1 .. MaxNumberOfRuns do
    for keyId in 1 .. MaxNumberOfKeys do

        /// get a new key from the pool of MaxNumberOfKeys keys
        let randomKeySuffix = randomGen.Next(0, MaxNumberOfKeys).ToString()        
        let key = "key_prefix_" + randomKeySuffix

        /// lookup key in the dictionary
        let mutable found, someData = dictionary.TryGetValue (key)
        if not(found) then
            /// add a new value
            someData <- new SomeData()
            dictionary.[key] <- someData

        /// update corresponding value in the dictionary
        someData.Value <- someData.Value + float(keyId) * float(runId)

        lookupCount <- lookupCount + 1

timer.Stop()

printfn "Time: %d ms" timer.ElapsedMilliseconds
printfn "Lookup count: %d" lookupCount

Prints:

Time: 245 ms
Lookup count: 854750

3条回答

爱情/是我丢掉的垃圾

2楼-- · 2020-02-09 06:55

Visual Studio 2010 uses a performant hash function for std::string, rather than an accurate one. Basically, if the key string is larger than 10 characters, the hash function stops using every character for the hash, and has a stride greater than 1.

size_t operator()(const _Kty& _Keyval) const
    {   // hash _Keyval to size_t value by pseudorandomizing transform
    size_t _Val = 2166136261U;
    size_t _First = 0;
    size_t _Last = _Keyval.size();
    size_t _Stride = 1 + _Last / 10;

    for(; _First < _Last; _First += _Stride)
        _Val = 16777619U * _Val ^ (size_t)_Keyval[_First];
    return (_Val);
    }

size() >= 10 - use every second character after the first
size() >= 20 - use every third character after the first
...

Thanks to this, collisions happen more frequently, which slows the code down of course. Try a custom hash function for the C++ version.

0人赞添加讨论(0) 举报

Viruses.

3楼-- · 2020-02-09 06:55

We can only speculate why one version is faster than the other. You could definitely use a profiler here to tell you where the hot spot is. So don't take any of this as a definitive answer.

Your note about the c++ version being faster with a shorter key length is illuminating because it could point to a couple of things:

Maybe the hash function for std::string is really optimized for small strings rather than long strings.
Maybe the longer string takes longer to copy into the unordered_set because it disables a small string optimization in VS 2010 c++ library. Thus the copy into the map requires an allocation.

Off the cuff, here's some wild observations based on my experience with unordered_map (though I'm more familiar with the boost implementation that Microsoft's).

In this example there's no reason to use a std::string as the key type, just use the integer value. This would presumably make both c++ and F# versions faster.
When you insert values into the map, it's probably not faster to do a find followed by an insert since both will require re-hashing the key string. Just used the [] operator, which does a find-or-insert operation on it's own. I guess this would depend on how often you find a hit in the map versus add a new value.
If allocation is the bottleneck and you must use a string key type you might get better performance out of storing a shared ptr to the string rather than copying the string when you insert it into the map.
Try providing your own hash function for the key type that ignores the "key_prefix_" part of the string
Try boost's implementation; maybe it's faster.

Again, a profile run would quickly tell you where to look for this kind of problem. Specifically, it will tell you if there's a bottleneck in hashing vs. a bottleneck in allocation.

0人赞添加讨论(0) 举报

Rolldiameter

4楼-- · 2020-02-09 07:00

When you're dealing with pure data-structure code, a speed ratio of 2.6 is not that strange. Take a look at the slides on this project to see what I mean.

0人赞添加讨论(0) 举报

C++ ~ 1M look-ups in unordered_map with string key

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间