sum of small double numbers c++

Assume that we have an array of small (about 10^(-15) ) double numbers in c++. If we calculate the sum of numbers in this array sequentially, for example

double sum = 0;
for (int i = 0; i < n; i++) sum+=array[i];

we get some value x.

But if we divide an array into some parts and then calculate the sum in each part and after this we add all the partial sums together we get some value x2, which is close to x but not exactly x. So I have lost accruacy in calculating sum.

Does someone know how to calculate the sum of small double numbers by partitioning these numbers into some parts without loosing accuracy?

标签： c++ numbers double sum

8条回答

该账号已被封号

2楼-- · 2019-01-22 22:45

Consider to apply Kahan summation algorithm for both your entire set or each of your subsets.

There are other questions referencing this algorithm that can help you

0人赞添加讨论(0) 举报

做自己的国王

3楼-- · 2019-01-22 22:51

It could be that your individual summations are being optimised and performed in register at 80 bits but then transfered back to 64 doubles (read about compiler switches). Naturally this would lose precision. If this is the case then breaking up the array and adding the individual 64-bit sums would give a different answer to adding them all as 80-bit aand converting the grand total back.

This may not be the reason but it might be worth researching further. Look at the chosen answer to this question

0人赞添加讨论(0) 举报

forever°为你锁心

4楼-- · 2019-01-22 22:54

Loss of precision in the result of adding numbers is not different when dealing with very small numbers from processing normal-size numbers. What may be relevant is: a) are the RELATIVE differences in size between the numbers large? b) have the numbers different SIGNS?

The last issue is usually at stake with addition-precision. What you should do - maybe not completely optimal, but a fair shot, and easy to implement - is:

a) split them in subsets of positives and negatives respectively

b) sort each subset

Then

c) take the largest (in absolute size) from the two sets combined, and initialize your sum with that number, and remove it from its list

d) iteratively: whenever the current sum is positive, take the largest remaining negative and add it to the sum, and remove it from its list; whenever the current sum is negative, do likewise.

In this way you have a fair chance that you've (almost-)minimized the loss of precision to what is inherently unavoidable (given the presentation of numbers).

0人赞添加讨论(0) 举报

甜甜的少女心

5楼-- · 2019-01-22 22:57

Binary floating point numbers used to represent decimal numbers have more precision than accuracy. You have found one way of surfacing the difference.

0人赞添加讨论(0) 举报

祖国的老花朵

6楼-- · 2019-01-22 23:00

Using Kahan Summation:

#include <numeric>
#include <iostream>
#include <vector>

struct KahanAccumulation
{
    double sum;
    double correction;
};

KahanAccumulation KahanSum(KahanAccumulation accumulation, double value)
{
    KahanAccumulation result;
    double y = value - accumulation.correction;
    double t = accumulation.sum + y;
    result.correction = (t - accumulation.sum) - y;
    result.sum = t;
    return result;
}

int main()
{
    std::vector<double> numbers = {0.01, 0.001, 0.0001, 0.000001, 0.00000000001};
    KahanAccumulation init = {0};
    KahanAccumulation result =
        std::accumulate(numbers.begin(), numbers.end(), init, KahanSum);

    std::cout << "Kahan Sum: " << result.sum << std::endl;
    return 0;
}

Output:

Kahan Sum: 0.011101

Code here.

0人赞添加讨论(0) 举报

该账号已被封号

7楼-- · 2019-01-22 23:01

The trick in those cases is to first order the array from smaller to higher, and then sum then in the cycle you've made. That way, the accuracy is best.

You can also check Kahan summation algorithm

0人赞添加讨论(0) 举报

1 2 下一页

sum of small double numbers c++

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间