C++ map vs map performance (I know, “ag

2020-02-20 07:59发布

I was using a map with a std::string key and while everything was working fine I wasn't getting the performance I expected. I searched for places to optimize and improved things only a little and that's when a colleague said, "that string key is going to be slow."

I read dozens of questions and they consistently say:

"don't use a char * as a key"
"std::string keys are never your bottleneck"
"the performance difference between a char * and a std::string is a myth."

I reluctantly tried a char * key and there was a difference, a big difference.

I boiled the problem down to a simple example:

#include <stdio.h>
#include <stdlib.h>
#include <map>

#ifdef USE_STRING

#include <string>
typedef std::map<std::string, int> Map;

#else

#include <string.h>
struct char_cmp { 
    bool operator () (const char *a,const char *b) const 
    {
        return strcmp(a,b)<0;
    } 
};
typedef std::map<const char *, int, char_cmp> Map;

#endif

Map m;

bool test(const char *s)
{
    Map::iterator it = m.find(s);
    return it != m.end();
}

int main(int argc, char *argv[])
{
    m.insert( Map::value_type("hello", 42) );

    const int lcount = atoi(argv[1]);
    for (int i=0 ; i<lcount ; i++) test("hello");
}

First the std::string version:

$ g++ -O3 -o test test.cpp -DUSE_STRING
$ time ./test 20000000
real    0m1.893s

Next the 'char *' version:

g++ -O3 -o test test.cpp             
$ time ./test 20000000
real    0m0.465s

That's a pretty big performance difference and about the same difference I see in my larger program.

Using a char * key is a pain to handle freeing the key and just doesn't feel right. C++ experts what am I missing? Any thoughts or suggestions?

6条回答
SAY GOODBYE
2楼-- · 2020-02-20 08:16

One solution to this is use a custom key class that acts as a cross between a const char * and a std::string, but has a boolean to tell at run time if it is "owning" or "non-owning". That way you can insert a key into the map which owns it's data (and will free it on destruction), and then compare with a key that does not own it's data. (This is a similar concept to the rust Cow<'a, str> type).

The below example also inherits from boost's string_ref to avoid having to re-implement hash functions etc.

NOTE this has the dangerous effect that if you accidentally insert into the map with the non-owning version, and the string you are pointing at goes out of scope, the key will point at already freed memory. The non-owning version can only be used for lookups.

#include <iostream>
#include <map>
#include <cstring>

#include <boost/utility/string_ref.hpp>

class MaybeOwned: public boost::string_ref {
public:
  // owning constructor, takes a std::string and copies the data
  // deletes it's copy on destruction
  MaybeOwned(const std::string& string):
    boost::string_ref(
      (char *)malloc(string.size() * sizeof(char)),
      string.size()
    ),
    owned(true)
  {
    memcpy((void *)data(), (void *)string.data(), string.size());
  }

  // non-owning constructor, takes a string ref and points to the same data
  // does not delete it's data on destruction
  MaybeOwned(boost::string_ref string):
    boost::string_ref(string),
    owned(false)
  {
  }

  // non-owning constructor, takes a c string and points to the same data
  // does not delete it's data on destruction
  MaybeOwned(const char * string):
    boost::string_ref(string),
    owned(false)
  {
  }

  // move constructor, tells source that it no longer owns the data if it did
  // to avoid double free
  MaybeOwned(MaybeOwned&& other):
    boost::string_ref(other),
    owned(other.owned)
  {
    other.owned = false;
  }

  // I was to lazy to write a proper copy constructor
  // (it would need to malloc and memcpy again if it owned the data)
  MaybeOwned(const MaybeOwned& other) = delete;

  // free owned data if it has any
  ~MaybeOwned() {
    if (owned) {
      free((void *)data());
    }
  }

private:
  bool owned;
};

int main()
{
  std::map<MaybeOwned, std::string> map;
  map.emplace(std::string("key"), "value");
  map["key"] += " here";
  std::cout << map["key"] << "\n";
}
查看更多
趁早两清
3楼-- · 2020-02-20 08:18

You are using a const char * as a lookup key for find(). For the map containing const char* this is the correct type that find expects and the lookup can be done directly.

The map containing std::string expects the parameter of find() to be a std::string, so in this case the const char* first has to be converted to a std::string. This is probably the difference you are seeing.

查看更多
Bombasti
4楼-- · 2020-02-20 08:18

If your in C++ 11, the copy constructor is not called unless the string is changed. Because std::string is a C++ construct, at least 1 dereference is needed to get at the string data.

My guess would be the time is taken up in an extra dereference (which if done 10000 times is costly), and std::string is likely doing appropriate null pointer checks, which again eats up cycles.

查看更多
ら.Afraid
5楼-- · 2020-02-20 08:19

As sth noted, the issue is one of specifications of the associative containers (sets and maps), in that their member search methods always force a conversion to the key_type, even if an operator< exists that would accept to compare your key against the keys in the map despite their different types.

On the other hand, the functions in <algorithm> do not suffer from this, for example lower_bound is defined as:

template< class ForwardIt, class T >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value );

template< class ForwardIt, class T, class Compare >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value, Compare comp );

So, an alternative could be:

std::vector< std::pair< std::string, int > >

And then you could do:

std::lower_bound(vec.begin(), vec.end(), std::make_pair("hello", 0), CompareFirst{})

Where CompareFirst is defined as:

struct CompareFirst {
     template <typename T, typename U>
     bool operator()(T const& t, U const& u) const { return t.first < u.first; }
};

Or even build a completely custom comparator (but it's a bit harder).

A vector of pair is generally more efficient in read-heavy loads, so it's really to store a configuration for example.

I do advise to provide methods to wrap the accesses. lower_bound is pretty low-level.

查看更多
仙女界的扛把子
6楼-- · 2020-02-20 08:29

After compilation the 2 "Hello" string literals will have the same memory address. On the char * case you use this memory addresses as keys.

In the string case every "Hello"s will be converted to a different object. This is a small part (really really small) of your performance difference.

A bigger part can be that as all the "Hello"s you are using has the same memory address strcmp will always get 2 equivalent char pointers and I'm quite sure that it early checks for this case :) So it will never really iterate on the all characters but the std::string comparison will.

查看更多
做个烂人
7楼-- · 2020-02-20 08:34

Store the std::string as a pointer and then you lose the copy constructor overhead.

But after you have to remember to handle the deletes.

The reason std::string is slow is that is constructs itself. Calls the copy constructor, and then at the end calls delete. If you create the string on the heap you lose the copy construction.

查看更多
登录 后发表回答