Weird numpy.sum behavior when adding zeros

I understand how mathematically-equivalent arithmentic operations can result in different results due to numerical errors (e.g. summing floats in different orders).

However, it surprises me that adding zeros to sum can change the result. I thought that this always holds for floats, no matter what: x + 0. == x.

Here's an example. I expected all the lines to be exactly zero. Can anybody please explain why this happens?

M = 4  # number of random values
Z = 4  # number of additional zeros
for i in range(20):
    a = np.random.rand(M)
    b = np.zeros(M+Z)
    b[:M] = a
    print a.sum() - b.sum()

-4.4408920985e-16
0.0
0.0
0.0
4.4408920985e-16
0.0
-4.4408920985e-16
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
2.22044604925e-16
0.0
4.4408920985e-16
4.4408920985e-16
0.0

It seems not to happen for smaller values of M and Z.

I also made sure a.dtype==b.dtype.

Here is one more example, which also demonstrates python's builtin sum behaves as expected:

a = np.array([0.1,      1.0/3,      1.0/7,      1.0/13, 1.0/23])
b = np.array([0.1, 0.0, 1.0/3, 0.0, 1.0/7, 0.0, 1.0/13, 1.0/23])
print a.sum() - b.sum()
=> -1.11022302463e-16
print sum(a) - sum(b)
=> 0.0

I'm using numpy V1.9.2.

标签： python numpy sum numerical-stability

1条回答

唯我独甜

2楼-- · 2019-03-14 20:37

Short answer: You are seeing the difference between

a + b + c + d

and

(a + b) + (c + d)

which because of floating point inaccuracies is not the same.

Long answer: Numpy implements pair-wise summation as an optimization of both speed (it allows for easier vectorization) and rounding error.

The numpy sum-implementation can be found here (function pairwise_sum_@TYPE@). It essentially does the following:

If the length of the array is less than 8, a regular for-loop summation is performed. This is why the strange result is not observed if W < 4 in your case - the same for-loop summation will be used in both cases.
If the length is between 8 and 128, it accumulates the sums in 8 bins r[0]-r[7] then sums them by ((r[0] + r[1]) + (r[2] + r[3])) + ((r[4] + r[5]) + (r[6] + r[7])).
Otherwise, it recursively sums two halves of the array.

Therefore, in the first case you get a.sum() = a[0] + a[1] + a[2] + a[3] and in the second case b.sum() = (a[0] + a[1]) + (a[2] + a[3]) which leads to a.sum() - b.sum() != 0.

0人赞添加讨论(0) 举报

Weird numpy.sum behavior when adding zeros

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间