value semantics vs output params with large data s

2019-02-08 22:59发布

2013 Keynote: Chandler Carruth: Optimizing the Emergent Structures of C++

  • 42:45
    You don't need output parameters, we have value semantics in C++. ... Anytime you see someone arguing that nonono I'm not going to return by value because copy would cost too much, someone working on an optimizer says they're wrong. All right? I have never yet seen a piece of code where that argument was correct. ... People don't realize how important value semantics are to the optimizer because it completely clarifies the aliasing scenarios.

Can anyone put this in the context of this answer: https://stackoverflow.com/a/14229152

I hear that being repeated on and on but, well, for me a function returning something is a source. Output parameters by reference take that characteristic out of a function, and removing such hard coded characteristic from a function allows one to manage outside instead, how output will be stored/reused.

My question is, even in the context of that SO answer, is there a way to tell, restructuring the code some other equivalent way, "ok now see, value semantics in this way doesn't lose for the output param version", or Chandler comments were specific to some contrived situations? I had even seen Andrei Alexandrescu arguing this in a talk and telling you can't escape using by ref output for better performance.

For another take on Andrei's comments see Eric Niebler: Out Parameters, Move Semantics, and Stateful Algorithms.

3条回答
放荡不羁爱自由
2楼-- · 2019-02-08 23:39

I was about to implement a solution using string_view and ranges and then found this:

This endorses value semantics output by return and I accept it as prettier than the current choice in the referred SO answer. This design also would take the characteristic of being a source out of a function even though it is returning. Simple illustration: one could have an outer reserved vector being repetitively filled by a returned range.

In any case, I'm not sure whether such version of split would help the optimizer in any sense (I'm talking in context to Chandler's talk here).

Notice

An output param version enforces the existence of a named variable at call site, which may be ugly to the eyes, but may turn debugging the call site always easier.

Sample solution

While std::split doesn't arrive, I've exercised the value semantics output by return version this way:

#include <string>
#include <string_view>
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/iterator/transform_iterator.hpp>

using namespace std;
using namespace std::experimental;
using namespace boost;

string_view stringfier(const cregex_token_iterator::value_type &match) {
    return {match.first, static_cast<size_t>(match.length())};
}

using string_view_iterator =
    transform_iterator<decltype(&stringfier), cregex_token_iterator>;

iterator_range<string_view_iterator> split(string_view s, const regex &r) {
    return {
        string_view_iterator(
            cregex_token_iterator(s.begin(), s.end(), r, -1),
            stringfier
        ),
        string_view_iterator()
    };
}

int main() {
    const regex r(" +");
    for (size_t i = 0; i < 1000000; ++i) {
        split("a b c", r);
    }
}

I have used Marshall Clow's string_view libc++ implementation found at https://github.com/mclow/string_view.

I have posted the timings at the bottom of the referred answer.

查看更多
乱世女痞
3楼-- · 2019-02-08 23:59

The problem with output parameters as described in the linked question is that they generally make the common calling case (i.e., you don't have a vector storage to use already) much more verbose than normal. For example, if you used return by value:

auto a = split(s, r);

if you used output parameters:

std::vector<std::string> a;
split(s,r,a);

The second one looks much less pretty to my eyes. Also, as Chandler mentioned, the optimizer can do much more with the first one than the second, depending on the rest of your code.

Is there a way we can get the best of both worlds? Emphatically yes, using move semantics:

std::vector<std::string> split(const std::string &s, const std::regex &r, std::vector<std::string> v = {})
{
    auto rit = std::sregex_token_iterator(s.begin(), s.end(), r, -1);
    auto rend = std::sregex_token_iterator();
    v.clear();
    while(rit != rend)
    {
        v.push_back(*rit);
        ++rit;
    }

    return v;
}

Now, in the common case, we can call split as normal (i.e., the first example) and it will allocate a new vector storage for us. In the important but rare case when we have to split repeatedly and we want to re-use the same storage, we can just move in a storage that persists between calls:

int main()
{
    const std::regex r(" +");
    std::vector<std::string> a;
    for(auto i=0; i < 1000000; ++i)
      a = split("a b c", r, std::move(a));
    return 0;
}

This runs just as fast as the output-argument method and what's going on is quite clear. You don't have to make your function hard to use all the time just to get good performance some of the time.

查看更多
Summer. ? 凉城
4楼-- · 2019-02-09 00:01

It's either an overstatement, generalization, a joke, or Chandler's idea of "Perfectly Reasonable Performance" (using modern C++ toolchains/libs) is unacceptable to my programs.

I find it a fairly narrow scope of optimizations. Penalties exist beyond that scope which cannot be ignored due to the actual complexities and designs found in programs - heap allocations were an example for the getline example. The specific optimizations may or may not always be applicable to the program in question, despite your attempts to reduce them. Real world structures will reference memory which may alias. You can reduce that, but it's not practical to believe that you can eliminate aliasing (from an optimizer's perspective).

Of course, RBV can be a great thing - it's just not appropriate for all cases. Even the link you referenced pointed out how one can avoid a ton of allocations/frees. Real programs and the data structures found in them are far more complex.

Later in the talk, he goes on to criticize the use of member functions (ref: S::compute()). Sure, there is a point to take away, but is it really reasonable to avoid using these language features entirely because it makes the optimizer's job easier? No. Will it always result in more readable programs? No. Will these code transformations always result in measurably faster programs? No. Are the changes required to transform your codebase worth the time you invest? Sometimes. Can you take away some points and make better informed decisions which impact some of your existing or future codebase? Yes.

Sometimes it helps to break down how exactly your program would execute, or what it would look like in C.

The optimizer will not solve all performance problems, and you should not rewrite programs with the assumption that the programs you are dealing with are "completely brain dead and broken designs", nor should you believe that using RBV will always result in "Perfectly Reasonable Performance". You can utilize new language features and make the optimizer's job easier, though there is much to gain there are often more important optimizations to invest your time on.

It's fine to consider the proposed changes; ideally you would measure the impact of such changes in real world execution times and impact on your source code before adopting these suggestions.

For your example: Even copying+assigning large structures by value can have significant costs. Beyond the cost of running constructors and destructors (along with their associated creation/cleanup of the resources they acquire and own, as pointed out in the link you reference), even things as simple as avoiding unnecessary structure copies can save you a ton of CPU time if you use references (where appropriate). The structure copy may be as simple as a memcpy. These aren't contrived problems; they appear in actual programs and the complexity can increase greatly with your program's complexity. Is reducing aliasing of some memory and other optimizations worth the costs, and does it result in "Perfectly Reasonable Performance"? Not always.

查看更多
登录 后发表回答