sum of small double numbers c++

2019-01-22 22:01发布

Assume that we have an array of small (about 10^(-15) ) double numbers in . If we calculate the sum of numbers in this array sequentially, for example

double sum = 0;
for (int i = 0; i < n; i++) sum+=array[i];

we get some value x.

But if we divide an array into some parts and then calculate the sum in each part and after this we add all the partial sums together we get some value x2, which is close to x but not exactly x. So I have lost accruacy in calculating sum.

Does someone know how to calculate the sum of small double numbers by partitioning these numbers into some parts without loosing accuracy?

8条回答
该账号已被封号
2楼-- · 2019-01-22 22:45

Consider to apply Kahan summation algorithm for both your entire set or each of your subsets.

There are other questions referencing this algorithm that can help you

查看更多
做自己的国王
3楼-- · 2019-01-22 22:51

It could be that your individual summations are being optimised and performed in register at 80 bits but then transfered back to 64 doubles (read about compiler switches). Naturally this would lose precision. If this is the case then breaking up the array and adding the individual 64-bit sums would give a different answer to adding them all as 80-bit aand converting the grand total back.

This may not be the reason but it might be worth researching further. Look at the chosen answer to this question

查看更多
forever°为你锁心
4楼-- · 2019-01-22 22:54

Loss of precision in the result of adding numbers is not different when dealing with very small numbers from processing normal-size numbers. What may be relevant is: a) are the RELATIVE differences in size between the numbers large? b) have the numbers different SIGNS?

The last issue is usually at stake with addition-precision. What you should do - maybe not completely optimal, but a fair shot, and easy to implement - is:

a) split them in subsets of positives and negatives respectively

b) sort each subset

Then

c) take the largest (in absolute size) from the two sets combined, and initialize your sum with that number, and remove it from its list

d) iteratively: whenever the current sum is positive, take the largest remaining negative and add it to the sum, and remove it from its list; whenever the current sum is negative, do likewise.

In this way you have a fair chance that you've (almost-)minimized the loss of precision to what is inherently unavoidable (given the presentation of numbers).

查看更多
甜甜的少女心
5楼-- · 2019-01-22 22:57

Binary floating point numbers used to represent decimal numbers have more precision than accuracy. You have found one way of surfacing the difference.

查看更多
祖国的老花朵
6楼-- · 2019-01-22 23:00

Using Kahan Summation:

#include <numeric>
#include <iostream>
#include <vector>

struct KahanAccumulation
{
    double sum;
    double correction;
};

KahanAccumulation KahanSum(KahanAccumulation accumulation, double value)
{
    KahanAccumulation result;
    double y = value - accumulation.correction;
    double t = accumulation.sum + y;
    result.correction = (t - accumulation.sum) - y;
    result.sum = t;
    return result;
}

int main()
{
    std::vector<double> numbers = {0.01, 0.001, 0.0001, 0.000001, 0.00000000001};
    KahanAccumulation init = {0};
    KahanAccumulation result =
        std::accumulate(numbers.begin(), numbers.end(), init, KahanSum);

    std::cout << "Kahan Sum: " << result.sum << std::endl;
    return 0;
}

Output:

Kahan Sum: 0.011101

Code here.

查看更多
该账号已被封号
7楼-- · 2019-01-22 23:01

The trick in those cases is to first order the array from smaller to higher, and then sum then in the cycle you've made. That way, the accuracy is best.

You can also check Kahan summation algorithm

查看更多
登录 后发表回答