In this example, the column-wise sum of an array pr
is computed in two different ways:
(a) take the sum over the first axis using p.sum
's axis
parameter
(b) slice the array along the the second axis and take the sum of each slice
import matplotlib.pyplot as plt
import numpy as np
m = 100
n = 2000
x = np.random.random_sample((m, n))
X = np.abs(np.fft.rfft(x)).T
frq = np.fft.rfftfreq(n)
total = X.sum(axis=0)
c = frq @ X / total
df = frq[:, None] - c
pr = df * X
a = np.sum(pr, axis=0)
b = [np.sum(pr[:, i]) for i in range(m)]
fig, ax = plt.subplots(1)
ax.plot(a)
ax.plot(b)
plt.show()
Both methods should return the same, but for whatever reason, in this example, they do not. As you can see in the plot below, a
and b
have totally different values. The difference is, however, so small that np.allclose(a, b)
is True.
If you replace pr
with some small random values, there is no difference between the two summation methods:
pr = np.random.randn(n, m) / 1e12
a = np.sum(pr, axis=0)
b = np.array([np.sum(pr[:, i]) for i in range(m)])
fig, ax = plt.subplots(1)
ax.plot(a)
ax.plot(b)
plt.show()
The second example indicates that the differences in the sums of the first example are not related to the summation methods. Then, is this a problem relate to floating point value summation? If so, why doesn't such an effect occure in the second example?
Why do the colum-wise sums differ in the first example, and which one is correct?
For why the results are different, see https://stackoverflow.com/a/55469395/7207392. The slice case uses pairwise summation, the axis case doesn't.
Which one is correct? Well, probably neither, but pairwise summation is expected to be more accurate.
Indeed, we can see that it is fairly close to the exact (within machine precision) result obtained using
math.fsum
.