what's the best way to calculate the average? With this question I want to know which algorithm for calculating the average is the best in a numerical sense. It should have the least rounding errors, should not be sensitive to over- or underflows and so on.
Thank you.
Additional information: incremental approaches preferred since the number of values may not fit into RAM (several parallel calculations on files larger than 4 GB).
You can have a look at http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3535 (Nick Higham, "The accuracy of floating point summation", SIAM Journal of Scientific Computation, 1993).
If I remember it correctly, compensated summation (Kahan summation) is good if all numbers are positive, as least as good as sorting them and adding them in ascending order (unless there are very very many numbers). The story is much more complicated if some numbers are positive and some are negative, so that you get cancellation. In that case, there is an argument for adding them in descending order.
If you want an O(N) algorithm, look at Kahan summation.
Just to add one possible answer for further discussion:
Incrementally calculate the average for each step:
AVG_n = AVG_(n-1) * (n-1)/n + VALUE_n / n
or pairwise combination
AVG_(n_a + n_b) = (n_a * AVG_a + n_b * AVG_b) / (n_a + n_b)
(I hope the formulas are clear enough)
Sort the numbers in ascending order of magnitude. Sum them, low magnitude first. Divide by the count.
I always use the following pseudocode:
float mean=0.0; // could use doulbe
int n=0; // could use long
for each x in data:
++n;
mean+=(x-mean)/n;
I don't have formal proofs of its stability but you can see that we won't have problems with numerical overflow, assuming that the data values are well behaved. It's referred to in Knuth's The Art of Computer Programming