push_back() and emplace_back() behind the scenes

2019-04-05 16:49发布

问题:

I'm currently learning C++ on my own, and I am curious about how push_back() and emplace_back() work under the hood. I've always assumed that emplace_back() is faster when you are trying to construct and push a large object to the back of a container, like a vector.

Let's suppose I have a Student object that I want to append to the back of a vector of Students.

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           /* initialize member variables */ { }
};

Suppose I call push_back() and push a Student object to the end of a vector:

vector<Student> vec;
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));

My understanding here is that push_back creates an instance of the Student object outside of the vector and then moves it to the back of the vector.

Diagram:

I can also emplace instead of push:

vector<Student> vec;
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);

My understanding here is that the Student object is constructed at the very back of the vector so that no moving is required.

Diagram:

Thus, it would make sense that emplacing would be faster, especially if many Student objects are added. However, when I timed these two versions of code:

for (int i = 0; i < 10000000; ++i) {
    vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
}

and

for (int i = 0; i < 10000000; ++i) {
    vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
}

I expected the latter to be faster, since the large Student object wouldn't have to be moved. Oddly enough, the emplace_back version ended up being slower (across multiple attempts). I also tried inserting 10000000 Student objects, where the constructor takes in references and the arguments in push_back() and emplace_back() are stored in variables. This also didn't work, as emplace was still slower.

I've checked to make sure that I'm inserting the same number of objects in both cases. The time difference isn't too large, but emplacing ended up slower by a few seconds.

Is there something wrong with my understanding of how push_back() and emplace_back() work? Thank you very much for your time!

Here's the code, as requested. I'm using the g++ compiler.

Push back:

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
    return 0;
}

Emplace back:

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
    return 0;
}

回答1:

This behavior is due to the complexity of std::string. There are a couple things interacting here:

  • The Small String Optimization (SSO)
  • In the push_back version, the compiler is able to determine the length of the string at compile-time, whereas the compiler was unable to do so for the emplace_back version. Thus, the emplace_back call requires calls to strlen. Furthermore, since the compiler doesn't know the length of the string literal, it has to emit code for both the SSO and non-SSO cases (see Jason Turner's "Initializer Lists Are Broken, Let's Fix Them"; it's a long talk, but he follows the problem of inserting strings into a vector throughout it)

Consider this simpler type:

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{a}
    , b{b}
    , c{c}
  {}
};

Note how the constructor copies a, b, and c.

Testing this against a baseline of just allocating memory, we can see that push_back outperforms emplace_back:

Click on image for quick-bench link

Because the strings in your example all fit inside the SSO buffer, copying is just as cheap as moving in this case. Thus, the constructor is perfectly efficient, and the improvements from emplace_back have a smaller effect.

Also, if we search the assembly for both a call to push_back and a call to emplace_back:

// push_back call
void foo(std::vector<type>& vec) {
    vec.push_back({"Bob", "pizza", "Smith"});
}
// emplace_back call
void foo(std::vector<type>& vec) {
    vec.emplace_back("Bob", "pizza", "Smith");
}

(Assembly not copied here. It's massive. std::string is complicated)

We can see that emplace_back has calls to strlen, whereas push_back does not. Since the distance between the string literal and the std::string being constructed is increased, the compiler was unable to optimize out the call to strlen.

Explicitly calling the std::string constructor would remove the calls to strlen, but would no longer construct them in place, so that doesn't work to speed up emplace_back.

All this said, if we leave the SSO by using long enough strings, the allocation cost completely drowns out these details, so both emplace_back and push_back have the same performance:

Click on image for quick-bench link


If you fix the constructor of type to move its arguments, emplace_back becomes faster in all cases.

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{std::move(a)}
    , b{std::move(b)}
    , c{std::move(c)}
  {}
};

SSO case

Click on image for quick-bench link

Long case

Click on image for quick-bench link

However, the SSO push_back case slowed down; the compiler seems to emit extra copies.

The optimal version of perfect forwarding does not suffer from this drawback (note the scale change on the vertical axis):

struct type {
  std::string a;
  std::string b;
  std::string c;

  template <typename A, typename B, typename C>
  type(A&& a, B&& b, C&& c)
    : a{std::forward<A>(a)}
    , b{std::forward<B>(b)}
    , c{std::forward<C>(c)}
  {}
};

Click on image for quick-bench link