I understand how mathematically-equivalent arithmentic operations can result in different results due to numerical errors (e.g. summing floats in different orders).
However, it surprises me that adding zeros to sum
can change the result. I thought that this always holds for floats, no matter what: x + 0. == x
.
Here's an example. I expected all the lines to be exactly zero. Can anybody please explain why this happens?
M = 4 # number of random values
Z = 4 # number of additional zeros
for i in range(20):
a = np.random.rand(M)
b = np.zeros(M+Z)
b[:M] = a
print a.sum() - b.sum()
-4.4408920985e-16
0.0
0.0
0.0
4.4408920985e-16
0.0
-4.4408920985e-16
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
2.22044604925e-16
0.0
4.4408920985e-16
4.4408920985e-16
0.0
It seems not to happen for smaller values of M
and Z
.
I also made sure a.dtype==b.dtype
.
Here is one more example, which also demonstrates python's builtin sum
behaves as expected:
a = np.array([0.1, 1.0/3, 1.0/7, 1.0/13, 1.0/23])
b = np.array([0.1, 0.0, 1.0/3, 0.0, 1.0/7, 0.0, 1.0/13, 1.0/23])
print a.sum() - b.sum()
=> -1.11022302463e-16
print sum(a) - sum(b)
=> 0.0
I'm using numpy V1.9.2.
Short answer: You are seeing the difference between
and
which because of floating point inaccuracies is not the same.
Long answer: Numpy implements pair-wise summation as an optimization of both speed (it allows for easier vectorization) and rounding error.
The numpy sum-implementation can be found here (function
pairwise_sum_@TYPE@
). It essentially does the following:W < 4
in your case - the same for-loop summation will be used in both cases.r[0]-r[7]
then sums them by((r[0] + r[1]) + (r[2] + r[3])) + ((r[4] + r[5]) + (r[6] + r[7]))
.Therefore, in the first case you get
a.sum() = a[0] + a[1] + a[2] + a[3]
and in the second caseb.sum() = (a[0] + a[1]) + (a[2] + a[3])
which leads toa.sum() - b.sum() != 0
.