Understanding std::accumulate

2019-03-09 12:01发布

问题:

I want to know why std::accumulate (aka reduce) 3rd parameter is needed. For those who do not know what accumulate is, it's used like so:

vector<int> V{1,2,3};  
int sum = accumulate(V.begin(), V.end(), 0);
// sum == 6

Call to accumulate is equivalent to:

sum = 0;  // 0 - value of 3rd param
for (auto x : V)  sum += x;

There is also optional 4th parameter, which allow to replace addition with any other operation.

Rationale that I've heard is that if you need let say not to add up, but multiply elements of a vector, we need other (non-zero) initial value:

vector<int> V{1,2,3};
int product = accumulate(V.begin(), V.end(), 1, multiplies<int>());

But why not do like Python - set initial value for V.begin(), and use range starting from V.begin()+1. Something like this:

int sum = accumulate(V.begin()+1, V.end(), V.begin());

This will work for any op. Why is 3rd parameter needed at all?

回答1:

The way things are, it is annoying for code that knows for sure a range isn't empty and that wants to start accumulating from the first element of the range on. Depending on the operation that is used to accumulate with, it's not always obvious what the 'zero' value to use is.

If on the other hand you only provide a version that requires non-empty ranges, it's annoying for callers that don't know for sure that their ranges aren't empty. An additional burden is put on them.

One perspective is that the best of both worlds is of course to provide both functionality. As an example, Haskell provides both foldl1 and foldr1 (which require non-empty lists) alongside foldl and foldr (which mirror std::transform).

Another perspective is that since the one can be implemented in terms of the other with a trivial transformation (as you've demonstrated: std::transform(std::next(b), e, *b, f) -- std::next is C++11 but the point still stands), it is preferable to make the interface as minimal as it can be with no real loss of expressive power.



回答2:

You're making a mistaken assumption: that type T is of the same type as the InputIterator.

But std::accumulate is generic, and allows all different kinds of creative accumulations and reductions.

Example #1: Accumulate salary across Employees

Here's a simple example: an Employee class, with many data fields.

class Employee {
/** All kinds of data: name, ID number, phone, email address... */
public:
 int monthlyPay() const;
};

You can't meaningfully "accumulate" a set of employees. That makes no sense; it's undefined. But, you can define an accumulation regarding the employees. Let's say we want to sum up all the monthly pay of all employees. std::accumulate can do that:

/** Simple class defining how to add a single Employee's
 *  monthly pay to our existing tally */
auto accumulate_func = [](int accumulator, const Employee& emp)
   return accumulator + emp.monthlyPay();
 };

// And here's how you call the actual calculation:
int TotalMonthlyPayrollCost(const vector<Employee>& V)
{
 return std::accumulate(V.begin(), V.end(), 0, accumulate_func);
}

So in this example, we're accumulating an int value over a collection of Employee objects. Here, the accumulation sum isn't the same type of variable that we're actually summing over.

Example #2: Accumulating an average

You can use accumulate for more complex types of accumulations as well - maybe want to append values to a vector; maybe you have some arcane statistic you're tracking across the input; etc. What you accumulate doesn't have to be just a number; it can be something more complex.

For example, here's a simple example of using accumulate to calculate the average of a vector of ints:

// This time our accumulator isn't an int -- it's a structure that lets us
// accumulate an average.
struct average_accumulate_t
{
    int sum;
    size_t n;
    double GetAverage() const { return ((double)sum)/n; }
};

// Here's HOW we add a value to the average:
auto func_accumulate_average = 
    [](average_accumulate_t accAverage, int value) {
        return average_accumulate_t(
            {accAverage.sum+value, // value is added to the total sum
            accAverage.n+1});      // increment number of values seen
    };

double CalculateAverage(const vector<int>& V)
{
    average_accumulate_t res =
        std::accumulate(V.begin(), V.end(), average_accumulate_t({0,0}), func_accumulate_average)
    return res.GetAverage();
}

Example #3: Accumulate a running average

Another reason you need the initial value is because that value isn't always the default/neutral value for the calculation you're making.

Let's build on the average example we've already seen. But now, we want a class that can hold a running average -- that is, we can keep feeding in new values, and check the average so far, across multiple calls.

class RunningAverage
{
    average_accumulate_t _avg;
public:
    RunningAverage():_avg({0,0}){} // initialize to empty average

    double AverageSoFar() const { return _avg.GetAverage(); }

    void AddValues(const vector<int>& v)
    {
        _avg = std::accumulate(v.begin(), v.end(), 
            _avg, // NOT the default initial {0,0}!
            func_accumulate_average);
    }

};

int main()
{
    RunningAverage r;
    r.AddValues(vector<int>({1,1,1}));
    std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 1.0
    r.AddValues(vector<int>({-1,-1,-1}));
    std::cout << "Running Average: " << r.AverageSoFar() << std::endl; // 0.0
}

This is a case where we absolutely rely on being able to set that initial value for std::accumulate - we need to be able to initialize the accumulation from different starting points.


In summary, std::accumulate is good for any time you're iterating over an input range, and building up one single result across that range. But the result doesn't need to be the same type as the range, and you can't make any assumptions about what initial value to use -- which is why you must have an initial instance to use as the accumulating result.



回答3:

If you wanted accumulate(V.begin()+1, V.end(), V.begin()) you could just write that. But what if you thought v.begin() might be v.end() (i.e. v is empty)? What if v.begin() + 1 is not implemented (because v only implements ++, not generized addition)? What if the type of the accumulator is not the type of the elements? Eg.

std::accumulate(v.begin(), v.end(), 0, [](long count, char c){
   return isalpha(c) ? count + 1 : count
});


回答4:

Because standard library algorithms are supposed to work for arbitrary ranges of (compatible) iterators. So the first argument to accumulate doesn't have to be begin(), it could be any iterator between begin() and one before end(). It could also be using reverse iterators.

The whole idea is to decouple algorithms from data. Your suggestion, if I understand it correctly, requires a certain structure in the data.



回答5:

It's indeed not needed. Our codebase has 2 and 3-argument overloads which use a T{} value.

However, std::accumulate is pretty old; it comes from the original STL. Our codebase has fancy std::enable_if logic to distinguish between "2 iterators and initial value" and "2 iterators and reduction operator". That requires C++11. Our code also uses a trailing return type (auto accumulate(...) -> ...) to calculate the return type, another C++11 feature.