-->

Is std::move really needed on initialization list

2019-03-09 00:42发布

问题:

Recently I read an example from cppreference.../vector/emplace_back:

struct President
{
    std::string name;
    std::string country;
    int year;

    President(std::string p_name, std::string p_country, int p_year)
        : name(std::move(p_name)), country(std::move(p_country)), year(p_year)
    {
        std::cout << "I am being constructed.\n";
    }

My question: is this std::move really needed? My point is that this p_name is not used in the body of constructor, so, maybe, there is some rule in the language to use move semantics for it by default?

That would be really annoying to add std::move on initialization list to every heavy member (like std::string, std::vector). Imagine hundreds of KLOC project written in C++03 - shall we add everywhere this std::move?

This question: move-constructor-and-initialization-list answer says:

As a golden rule, whenever you take something by rvalue reference, you need to use it inside std::move, and whenever you take something by universal reference (i.e. deduced templated type with &&), you need to use it inside std::forward

But I am not sure: passing by value is rather not universal reference?

[UPDATE]

To make my question more clear. Can the constructor arguments be treated as XValue - I mean expiring values?

In this example AFAIK we do not use std::move:

std::string getName()
{
   std::string local = "Hello SO!";
   return local; // std::move(local) is not needed nor probably correct
}

So, would it be needed here:

void President::setDefaultName()
{
   std::string local = "SO";
   name = local; // std::move OR not std::move?
}

For me this local variable is expiring variable - so move semantics could be applied... And this similar to arguments passed by value....

回答1:

My question: is this std::move really needed? My point is that compiler sees that this p_name is not used in the body of constructor, so, maybe, there is some rule to use move semantics for it by default?

In general, when you want to turn an lvalue to an rvalue, then yes, you need a std::move(). See also Do C++11 compilers turn local variables into rvalues when they can during code optimization?

void President::setDefaultName()
{
   std::string local = "SO";
   name = local; // std::move OR not std::move?
}

For me this local variable is expiring variable - so move semantics could be applied... And this similar to arguments passed by value....

Here, I would want the optimizer to eliminate the superfluous local ALTOGETHER; unfortunately, it is not the case in practice. Compiler optimizations get tricky when heap memory comes in to play, see BoostCon 2013 Keynote: Chandler Carruth: Optimizing the Emergent Structures of C++. One of my takeaways from Chandler's talk is that optimizers simply tend to give up when it comes to heap allocated memory.

See the code below for a disappointing example. I don't use std::string in this example because that's a heavily optimized class with inline assembly code, often yielding counterintuitive generated code. To add injury to insult, std::string is roughly speaking a reference counted shared pointer in gcc 4.7.2 at least (copy-on-write optimization, now forbidden by the 2011 standard for std::string). So the example code without std::string:

#include <algorithm>
#include <cstdio>

int main() {
   char literal[] = { "string literal" };
   int len = sizeof literal;
   char* buffer = new char[len];
   std::copy(literal, literal+len, buffer);
   std::printf("%s\n", buffer);
   delete[] buffer;
}

Clearly, according to the "as-if" rule, the generated code could be optimized to this:

int main() {
   std::printf("string literal\n");
}

I have tried it with GCC 4.9.0 and Clang 3.5 with link time optimizations enabled (LTO), and none of them could optimize the code to this level. I looked at the generated assembly code: They both allocated the memory on the heap and did the copy. Well, yeah, that's disappointing.

Stack allocated memory is different though:

#include <algorithm>
#include <cstdio>

int main() {
   char literal[] = { "string literal" };
   const int len = sizeof literal;
   char buffer[len];
   std::copy(literal, literal+len, buffer);
   std::printf("%s\n", buffer);
}

I have checked the assembly code: Here, the compiler is able to reduce the code to basically just std::printf("string literal\n");.

So my expectations that the superfluous local in your example code could be eliminated is not completely unsupported: As my latter example with the stack allocated array shows, it can be done.

Imagine hundreds of KLOC project written in C++03 - shall we add everywhere this std::move?
[...]
But I am not sure: passing by value is rather not universal reference?

"Want speed? Measure." (by Howard Hinnant)

You can easily find yourself in a situation that you do your optimizations just to find out that your optimizations made the code slower. :( My advice is the same as Howard Hinnant's: Measure.

std::string getName()
{
   std::string local = "Hello SO!";
   return local; // std::move(local) is not needed nor probably correct
}

Yes, but we have rules for this special case: It is called named return value optimization (NRVO).



回答2:

The current rule, as amended by DR1579, is that xvalue transformation occurs when a NRVOable local or parameter, or an id-expression referring to a local variable or parameter, is the argument to a return statement.

This works because, clearly, after the return statement the variable can't be used again.

Except that's not the case:

struct S {
    std::string s;
    S(std::string &&s) : s(std::move(s)) { throw std::runtime_error("oops"); }
};

S foo() {
   std::string local = "Hello SO!";
   try {
       return local;
   } catch(std::exception &) {
       assert(local.empty());
       throw;
   }
}

So even for a return statement, it's not actually guaranteed that a local variable or parameter appearing in that statement is the last use of that variable.

It's not totally out of the question that the standard could be changed to specify that the "last" usage of a local variable is subject to xvalue transformation; the problem is defining what the "last" usage is. And another problem is that this has non-local effects within a function; adding e.g. a debugging statement lower down could mean that an xvalue transformation you were relying on no longer occurs. Even a single-occurrence rule wouldn't work, as a single statement can be executed multiple times.

Perhaps you'd be interested in submitting a proposal for discussion on the std-proposals mailing list?



回答3:

My question: is this std::move really needed? My point is that this p_name is not used in the body of constructor, so, maybe, there is some rule in the language to use move semantics for it by default?

Of course it's needed. p_name is a lvalue, hence std::move is needed to turn it into a rvalue and select the move constructor.

That's not only what the language says -- what if the type is like this:

struct Foo {
    Foo() { cout << "ctor"; }
    Foo(const Foo &) { cout << "copy ctor"; }
    Foo(Foo &&) { cout << "move ctor"; }
};

The language mandates that copy ctor must be printed if you omit the move. There are no options here. The compiler can't do this any different.

Yes, copy elision still applies. But not in your case (initialization list), see the comments.


Or does your question involve why are we using that pattern?

The answer is that it provides a safe pattern when we want to store a copy of the passed argument, while benefiting from moves, and avoiding a combinatorial explosion of the arguments.

Consider this class which holds two strings (i.e. two "heavy" objects to copy).

struct Foo {
     Foo(string s1, string s2)
         : m_s1{s1}, m_s2{s2} {}
private:
     string m_s1, m_s2;
};

So let's see what happens in various scenarios.

Take 1

string s1, s2; 
Foo f{s1, s2}; // 2 copies for passing by value + 2 copies in the ctor

Argh, this is bad. 4 copies happening here, when only 2 are really needed. In C++03 we'd immediately turn the Foo() arguments into const-refs.

Take 2

Foo(const string &s1, const string &s2) : m_s1{s1}, m_s2{s2} {}

Now we have

Foo f{s1, s2}; // 2 copies in the ctor

That's much better!

But what about moves? For instance, from temporaries:

string function();
Foo f{function(), function()}; // 2 moves + still 2 copies in the ctor

Or when explicitely moving lvalues into the ctor:

Foo f{std::move(s1), std::move(s2)}; // 2 moves + still 2 copies in the ctor

That's not that good. We could've used string's move ctor to initialize directly the Foo members.

Take 3

So, we could introduce some overloads for Foo's constructor:

Foo(const string &s1, const string &s2) : m_s1{s1}, m_s2{s2} {}
Foo(string &&s1, const string &s2) : m_s1{s1}, m_s2{s2} {}
Foo(const string &s1, string &s2) : m_s1{s1}, m_s2{s2} {}
Foo(string &&s1, string &&s2) : m_s1{s1}, m_s2{s2} {}

So, ok, now we have

Foo f{function(), function()}; // 2 moves
Foo f2{s1, function()}; // 1 copy + 1 move

Good. But heck, we get a combinatorial explosion: each and every argument now must appear in its const-ref + rvalue variants. What if we get 4 strings? Are we going to write 16 ctors?

Take 4 (the good one)

Let's instead take a look at:

Foo(string s1, string s2) : m_s1{std::move(s1)}, m_s2{std::move(s2)} {}

With this version:

Foo foo{s1, s2}; // 2 copies + 2 moves
Foo foo2{function(), function()}; // 2 moves in the arguments + 2 moves in the ctor
Foo foo3{std::move(s1), s2}; // 1 copy, 1 move, 2 moves

Since moves are extremely cheap, this pattern allows to fully benefit from them and avoid the combinatorial explosion. We can indeed move all the way down.

And I didn't even scratch the surface of exception safety.


As part of a more general discussion, let's now consider the following snippet, where all the classes involved make a copy of s by pass by value:

{
// some code ...
std::string s = "123";

AClass obj {s};
OtherClass obj2 {s};
Anotherclass obj3 {s};

// s won't be touched any more from here on
}

If I got you correctly, you'd really like that the compiler actually moved s away on its last usage:

{
// some code ...
std::string s = "123";

AClass obj {s};
OtherClass obj2 {s};
Anotherclass obj3 {std::move(s)}; // bye bye s

// s won't be touched any more from here on. 
// hence nobody will notice s is effectively in a "dead" state!
}

I told you why the compiler cannot do that, but I get your point. It would make sense from a certain point of view -- it's nonsense to make s live any longer than its last usage. Food for thought for C++2x, I guess.



回答4:

I made some further investigation and querying another forums on net.

Unfortunately it seems that this std::move is necessary not only because C++ standard says so, but also otherwise it would be dangerous:

((credit to Kalle Olavi Niemitalo from comp.std.c++ - his answer here))

#include <memory>
#include <mutex>
std::mutex m;
int i;
void f1(std::shared_ptr<std::lock_guard<std::mutex> > p);
void f2()
{
    auto p = std::make_shared<std::lock_guard<std::mutex> >(m);
    ++i;
    f1(p);
    ++i;
}

If f1(p) automatically changed to f1(std::move(p)), then the mutex would be unlocked already before the second ++i; statement.

The following example seems more realistic:

#include <cstdio>
#include <string>
void f1(std::string s) {}
int main()
{
    std::string s("hello");
    const char *p = s.c_str();
    f1(s);
    std::puts(p);
}

If f1(s) automatically changed to f1(std::move(s)), then the pointer p would no longer be valid after f1 returns.