可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Consider the following function accept that takes a "universal reference" of type T and forwards that to a parse<T>() function object with an overload for lvalues and one for rvalues:

template<class T>
void accept(T&& arg)
{
    parse<T>()(std::forward<T>(arg), 0); // copy or move, depending on rvaluedness of arg
}

template<class T>
class parse
{
    // parse will modify a local copy or move of its input parameter
    void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
    void operator()(T&& arg)    , int n) const { /* optimized for rvalues */ }
};

Since perfect forwarding leaves the source object in a valid but undefined state, it is impossible to perfectly forward again within the same scope. Below my attempt to have as few copies as possible in a hypothetical split() function that takes an int representing the number passes that have to be made over the input data:

template<class T>
void split(T&& arg, int n)
{
    for (auto i = 0; i < n - 1; ++i)
        parse<T>()(arg , i);                 // copy n-1 times
    parse<T>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}

Question: is this the recommended way to apply perfect forwarding for multiple passes over the same data? If not, what is a more idiomatic way to minimize the number of copies?

回答1:

Question: is this the recommended way to apply perfect forwarding for multiple passes over the same data?

Yes, this is the recommended way to apply perfect forwarding (or move) when you need to pass the data multiple times. Only (potentially) move from it on your last access. Indeed, this scenario was foreseen in the original move paper, and is the very reason that "named" variables declared with type rvalue-reference are not implicitly moved from. From N1377:

Even though named rvalue references can bind to an rvalue, they are treated as lvalues when used. For example:

struct A {};

void h(const A&);
void h(A&&);

void g(const A&);
void g(A&&);

void f(A&& a)
{
    g(a);  // calls g(const A&)
    h(a);  // calls h(const A&)
}

Although an rvalue can bind to the "a" parameter of f(), once bound, a is now treated as an lvalue. In particular, calls to the overloaded functions g() and h() resolve to the const A& (lvalue) overloads. Treating "a" as an rvalue within f would lead to error prone code: First the "move version" of g() would be called, which would likely pilfer "a", and then the pilfered "a" would be sent to the move overload of h().

If you want h(a) to move in the above example, you have to do so explicitly:

    h(std::move(a));  // calls h(A&&);

As Casey points out in the comments, you have an overloading problem when passing in lvalues:

#include  <utility>
#include  <type_traits>

template<class T>
class parse
{
    static_assert(!std::is_lvalue_reference<T>::value,
                               "parse: T can not be an lvalue-reference type");
public:
    // parse will modify a local copy or move of its input parameter
    void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
    void operator()(T&& arg     , int n) const { /* optimized for rvalues */ }
};

template<class T>
void split(T&& arg, int n)
{
    typedef typename std::decay<T>::type Td;
    for (auto i = 0; i < n - 1; ++i)
        parse<Td>()(arg , i);                 // copy n-1 times
    parse<Td>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}

Above I've fixed it as Casey suggests, by instantiating parse<T> only on non-reference types using std::decay. I've also added a static_assert to ensure that the client does not accidentally make this mistake. The static_assert isn't strictly necessary because you will get a compile-time error regardless. However the static_assert can offer a more readable error message.

That is not the only way to fix the problem though. Another way, which would allow the client to instantiate parse with an lvalue reference type, is to partially specialize parse:

template<class T>
class parse<T&>
{
public:
    // parse will modify a local copy or move of its input parameter
    void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
};

Now the client doesn't need to do the decay dance:

template<class T>
void split(T&& arg, int n)
{
    for (auto i = 0; i < n - 1; ++i)
        parse<T>()(arg , i);                 // copy n-1 times
    parse<T>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}

And you can apply special logic under parse<T&> if necessary.

回答2:

(I know, it is an old thread)

As stated in the comments, the data is a large array or vector of uint64_t. A better optimization than parameter passing to prevent a final copy would probably be to optimize the many copy operations to

read once
write many times (for each intended pass)

in one step instead of many independent copies.

A starting point could be this faster alternative to memcpy? which has answers that include memcpy-like code. You would have to multiply the code line that writes to the destination to write several copies of the data instead.

You can also combine memset, which is optimized for writing the same value to memory over and over again, and memcpy, which is optimized for reading and writing blocks of memory once for each block. You could look into optimized source code here: https://github.com/KNNSpeed/AVX-Memmove

The best code will be specific to the architecture and processor used. So you would have to test and compare your achieved speed.