Consider the following code:
#include <iostream>
#include <type_traits>
struct A
{
A() {}
A(const A&) { std::cout << "Copy" << std::endl; }
A(A&&) { std::cout << "Move" << std::endl; }
};
template <class T>
struct B
{
T x;
};
#define MAKE_B(x) B<decltype(x)>{ x }
template <class T>
B<T> make_b(T&& x)
{
return B<T> { std::forward<T>(x) };
}
int main()
{
std::cout << "Macro make b" << std::endl;
auto b1 = MAKE_B( A() );
std::cout << "Non-macro make b" << std::endl;
auto b2 = make_b( A() );
}
This outputs the following:
Macro make b
Non-macro make b
Move
Note that b1 is constructed without a move, but the construction of b2 requires a move.
I also need to type deduction, as A
in real life usage may be a complex type which is difficult to write explicitly. I also need to be able to nest calls (i.e. make_c(make_b(A()))
).
Is such a function possible?
Further thoughts:
N3290 Final C++0x draft page 284:
This elision of copy/move operations,
called copy elision, is permitted in
the following circumstances:
when a temporary class object that has
not been bound to a reference (12.2)
would be copied/moved to a class
object with the same cv-unqualified
type, the copy/move operation can be
omitted by constructing the temporary
object directly into the target of the
omitted copy/move
Unfortunately this seems that we can't elide copies (and moves) of function parameters to function results (including constructors) as those temporaries are either bound to a reference (when passed by reference) or no longer temporaries (when passed by value). It seems the only way to elide all copies when creating a composite object is to create it as an aggregate. However, aggregates have certain restrictions, such as requiring all members be public, and no user defined constructors.
I don't think it makes sense for C++ to allow optimizations for POD C-structs aggregate construction but not allow the same optimizations for non-POD C++ class construction.
Is there any way to allow copy/move elision for non-aggregate construction?
My answer:
This construct allows for copies to be elided for non-POD types. I got this idea from David Rodríguez's answer below. It requires C++11 lambdas. In this example below I've changed make_b
to take two arguments to make things less trivial. There are no calls to any move or copy constructors.
#include <iostream>
#include <type_traits>
struct A
{
A() {}
A(const A&) { std::cout << "Copy" << std::endl; }
A(A&&) { std::cout << "Move" << std::endl; }
};
template <class T>
class B
{
public:
template <class LAMBDA1, class LAMBDA2>
B(const LAMBDA1& f1, const LAMBDA2& f2) : x1(f1()), x2(f2())
{
std::cout
<< "I'm a non-trivial, therefore not a POD.\n"
<< "I also have private data members, so definitely not a POD!\n";
}
private:
T x1;
T x2;
};
#define DELAY(x) [&]{ return x; }
#define MAKE_B(x1, x2) make_b(DELAY(x1), DELAY(x2))
template <class LAMBDA1, class LAMBDA2>
auto make_b(const LAMBDA1& f1, const LAMBDA2& f2) -> B<decltype(f1())>
{
return B<decltype(f1())>( f1, f2 );
}
int main()
{
auto b1 = MAKE_B( A(), A() );
}
If anyone knows how to achieve this more neatly I'd be quite interested to see it.
Previous discussion:
This somewhat follows on from the answers to the following questions:
Can creation of composite objects from temporaries be optimised away?
Avoiding need for #define with expression templates
Eliminating unnecessary copies when building composite objects
As Anthony has already mentioned, the standard forbids copy elision from the argument of a function to the return of the same function. The rationale that drives that decision is that copy elision (and move elision) is an optimization by which two objects in the program are merged into the same memory location, that is, the copy is elided by having both objects be one. The (partial) standard quote is below, followed by a set of circumstances under which copy elision is allowed, which do not include that particular case.
So what makes that particular case different? The difference is basically that the fact that there is a function call between the original and the copied objects, and the function call implies that there are extra constraints to consider, in particular the calling convention.
Given a function T foo( T )
, and a user calling T x = foo( T(param) );
, in the general case, with separate compilation, the compiler will create an object $tmp1
in the location that the calling convention requires the first argument to be. It will then call the function and initialize x
from the return statement. Here is the first opportunity for copy elision: by carefully placing x
on the location where the returned temporary is, x
and the returned object from foo
become a single object, and that copy is elided. So far so good. The problem is that the calling convention in general will not have the returned object and the parameter in the same location, and because of that, $tmp1
and x
cannot be a single location in memory.
Without seeing the function definition the compiler cannot possibly know that the only purpose of the argument to the function is to serve as return statement, and as such it cannot elide that extra copy. It can be argued that if the function is inline
then the compiler would have the missing extra information to understand that the temporary used to call the function, the returned value and x
are a single object. The problem is that that particular copy can only be elided if the code is actually inlined (not only if it is marked as inline
but actually inlined) If a function call is required, then the copy cannot be elided. If the standard allowed that copy to be elided when the code is inlined, it would imply that the behavior of a program would differ due to the compiler and not user code --the inline
keyword does not force inlining, it only means that multiple definitions of the same function do not represent a violation of the ODR.
Note that if the variable was created inside the function (as compared to passed into it) as in: T foo() { T tmp; ...; return tmp; } T x = foo();
then both copies can be elided: There is no restriction as of where tmp
has to be created (it is not an input or output parameter to the function so the compiler is able to relocate it anywhere, including the location of the returned type, and on the calling side, x
can as in the previous example be carefully located in the location of that same return statement, which basically means that tmp
, the return statement and x
can be a single object.
As of your particular problem, if you resort to a macro, the code is inlined, there are no restrictions on the objects and the copy can be elided. But if you add a function, you cannot elide the copy from the argument to the return statement. So just avoid it. Instead of using a template that will move the object, create a template that will construct an object:
template <typename T, typename... Args>
T create( Args... x ) {
return T( x... );
}
And that copy can be elided by the compiler.
Note that I have not dealt with move construction, as you seem concerned on the cost of even move construction, even though I believe that you are barking at the wrong tree. Given a motivating real use case, I am quite sure that people here will come up with a couple of efficient ideas.
12.8/31
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects. In such cases, the implementation treats the source and target of the omitted copy/move operation as simply two different ways of referring to the same object, and the destruction of that object occurs at the later of the times when the two objects would have been destroyed without the optimization.
... but the construction of b2 requires a move.
No, it doesn't. The compiler is allowed to elide the move; whether that happens is implementation-specific, depending on several factors. It is also allowed to move, but it cannot copy (moving must be used instead of copying in this situation).
It is true that you are not guaranteed that the move will be elided. If you must be guaranteed that no move will occur, then either use the macro or investigate your implementation's options to control this behavior, particularly function inlining.
You cannot optimize out the copy/move of the A
object from the parameter of make_b
to the member of the created B
object.
However, this is the whole point of move semantics --- by providing a light-weight move operation for A
you can avoid a potentially expensive copy. e.g. if A
was actually std::vector<int>
, then the copying of the vector's contents can be avoided by use of the move constructor, and instead just the housekeeping pointers will be transferred.
This isn't a big problem. All it needs is changing the structure of the code slightly.
Instead of:
B<A> create(A &&a) { ... }
int main() { auto b = create(A()); }
You can always do:
int main() { A a; B<A> b(a); ... }
If the constructor of B is like this, then it'll not take any copies:
template<class T>
class B { B(T &t) :t(t) { } T &t; };
The composite case will work too:
struct C { A a; B b; };
void init(C &c) { c.a = 10; c.b = 20; }
int main() { C c; init(c); }
And it doesn't even need c++0x features to do this.