Composability of STL algorithms

2019-03-10 15:24发布

问题:

The STL algorithms are a pretty useful thing in C++. But one thing that kind of irks me is that they seem to lack composability.

For example, let's say I have a vector<pair<int, int>> and want to transform that to a vector<int> containing only the second member of the pair. That's simple enough:

std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;

std::transform(values.begin(), values.end(), std::back_inserter(result),
    [] (std::pair<int, int> p) { return p.second; });

Or maybe I want to filter the vector for only those pairs whose first member is even. Also pretty simple:

std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> result;

std::copy_if(values.begin(), values.end(), std::back_inserter(result),
    [] (std::pair<int, int> p) { return (p.first % 2) == 0; });

But what if I want to do both? There is no transform_if algorithm, and using both transform and copy_if seems to require allocating a temporary vector to hold the intermediate result:

std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> temp;
std::vector<int> result;

std::copy_if(values.begin(), values.end(), std::back_inserter(temp),
    [] (std::pair<int, int> p) { return (p.first % 2) == 0; });

std::transform(values.begin(), values.end(), std::back_inserter(result),
    [] (std::pair<int, int> p) { return p.second; });

This seems rather wasteful to me. The only way I can think of to avoid the temporary vector is to abandon transform and copy_if and simply use for_each (or a regular for loop, whichever suits your fancy):

std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;

std::for_each(values.begin(), values.end(),
    [&result] (std::pair<int, int> p) 
        { if( (p.first % 2) == 0 ) result.push_back(p.second); });

Am I missing something here? Is there a good way to compose two existing STL algorithms into a new one without needing temporary storage?

回答1:

You're right. You can use Boost.Range adaptors to achieve composition.



回答2:

I think the problem is unfortunately structural

  1. C++ uses two iterators to represent a sequence
  2. C++ functions are single-valued

so you cannot chain them because a function cannot return "a sequence".

An option would have been to use single-object sequences instead (like the range approach from boost). This way you could have combined the result of one processing as the input of another... (one object -> one object).

In the standard C++ library instead the processing is (two objects -> one object) and it's clear that this cannot be chained without naming the temporary object.



回答3:

Back in 2000, the problem was already noted. Gary Powell and Martin Weiser came up with a "view" concept, and coined the name "View Template Library". It didn't take off then but the idea makes sense. A "view" adaptor essentially applies an on-the-fly transform. For instance, it can adapt the value_type.

The concept probably should be readdressed now we have C++0x. We've made quite some progress in generic programming since 2000.

For example, let's use the vector<pair<int, int>> to vector<int> example. That could be quite simple:

std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, [](std::pair<int, int> p) { return p.first }); 
std::vector<int> result(view.begin(), view.end());

Or, using the boost::bind techniques, even simpler:

std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, &std::pair<int, int>::first); 
std::vector<int> result(view.begin(), view.end());


回答4:

Not sure if this is still active, but... A new light wait header only lib that does what you describe. Doc talks about lazy evaluation and com compossible generators.

Doc snippet:

  • Read in up to 10 integers from a file "test.txt".
  • filter for the even numbers, square them and sum their values.
    int total = lz::read<int>(ifstream("test.txt")) | lz::limit(10) |
                lz::filter([](int i) { return i % 2 == 0; }) |
                lz::map([](int i) { return i *  i; }) | lz::sum();

you can split that line up into multiple expressions.

    auto numbers = lz::read<int>(ifstream("test.txt")) | lz::limit(10);
    auto evenFilter = numbers | lz::filter([](int i) { return i % 2 == 0; });
    auto squares = evenFilter | lz::map([](int i) { return i *  i; });
    int total = squares | lz::sum();
  • Even though this expression is split over multiple variable assignments, it is not any less efficient.
  • Each intermediate variable simply describes a unit of code to be executed. All held in stack.

https://github.com/SaadAttieh/lazyCode