Consider the following function accept
that takes a "universal reference" of type T
and forwards that to a parse<T>()
function object with an overload for lvalues and one for rvalues:
template<class T>
void accept(T&& arg)
{
parse<T>()(std::forward<T>(arg), 0); // copy or move, depending on rvaluedness of arg
}
template<class T>
class parse
{
// parse will modify a local copy or move of its input parameter
void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
void operator()(T&& arg) , int n) const { /* optimized for rvalues */ }
};
Since perfect forwarding leaves the source object in a valid but undefined state, it is impossible to perfectly forward again within the same scope. Below my attempt to have as few copies as possible in a hypothetical split()
function that takes an int
representing the number passes that have to be made over the input data:
template<class T>
void split(T&& arg, int n)
{
for (auto i = 0; i < n - 1; ++i)
parse<T>()(arg , i); // copy n-1 times
parse<T>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}
Question: is this the recommended way to apply perfect forwarding for multiple passes over the same data? If not, what is a more idiomatic way to minimize the number of copies?
Question: is this the recommended way to apply perfect forwarding for multiple passes over the same data?
Yes, this is the recommended way to apply perfect forwarding (or move) when you need to pass the data multiple times. Only (potentially) move from it on your last access. Indeed, this scenario was foreseen in the original move paper, and is the very reason that "named" variables declared with type rvalue-reference are not implicitly moved from. From N1377:
Even though named rvalue references can bind to an rvalue, they are
treated as lvalues when used. For example:
struct A {};
void h(const A&);
void h(A&&);
void g(const A&);
void g(A&&);
void f(A&& a)
{
g(a); // calls g(const A&)
h(a); // calls h(const A&)
}
Although an rvalue can bind to the "a" parameter of f(), once bound, a
is now treated as an lvalue. In particular, calls to the overloaded
functions g() and h() resolve to the const A& (lvalue) overloads.
Treating "a" as an rvalue within f would lead to error prone code:
First the "move version" of g() would be called, which would likely
pilfer "a", and then the pilfered "a" would be sent to the move
overload of h().
If you want h(a)
to move in the above example, you have to do so explicitly:
h(std::move(a)); // calls h(A&&);
As Casey points out in the comments, you have an overloading problem when passing in lvalues:
#include <utility>
#include <type_traits>
template<class T>
class parse
{
static_assert(!std::is_lvalue_reference<T>::value,
"parse: T can not be an lvalue-reference type");
public:
// parse will modify a local copy or move of its input parameter
void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
void operator()(T&& arg , int n) const { /* optimized for rvalues */ }
};
template<class T>
void split(T&& arg, int n)
{
typedef typename std::decay<T>::type Td;
for (auto i = 0; i < n - 1; ++i)
parse<Td>()(arg , i); // copy n-1 times
parse<Td>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}
Above I've fixed it as Casey suggests, by instantiating parse<T>
only on non-reference types using std::decay
. I've also added a static_assert to ensure that the client does not accidentally make this mistake. The static_assert
isn't strictly necessary because you will get a compile-time error regardless. However the static_assert
can offer a more readable error message.
That is not the only way to fix the problem though. Another way, which would allow the client to instantiate parse
with an lvalue reference type, is to partially specialize parse:
template<class T>
class parse<T&>
{
public:
// parse will modify a local copy or move of its input parameter
void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
};
Now the client doesn't need to do the decay
dance:
template<class T>
void split(T&& arg, int n)
{
for (auto i = 0; i < n - 1; ++i)
parse<T>()(arg , i); // copy n-1 times
parse<T>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}
And you can apply special logic under parse<T&>
if necessary.
(I know, it is an old thread)
As stated in the comments, the data is a large array or vector of uint64_t. A better optimization than parameter passing to prevent a final copy would probably be to optimize the many copy operations to
- read once
- write many times (for each intended pass)
in one step instead of many independent copies.
A starting point could be this faster alternative to memcpy? which has answers that include memcpy-like code. You would have to multiply the code line that writes to the destination to write several copies of the data instead.
You can also combine memset, which is optimized for writing the same value to memory over and over again, and memcpy, which is optimized for reading and writing blocks of memory once for each block. You could look into optimized source code here: https://github.com/KNNSpeed/AVX-Memmove
The best code will be specific to the architecture and processor used. So you would have to test and compare your achieved speed.