I have been unable to find a way of getting the average value from a vector of integers in C++.
I can't possibly start adding all of the values because I could exceed the maximum integer accepted value.
How can I calculate this efficiently and quickly ? Are there any standard libraries in the C++ language to do that ?
A lot of discussion of finding a sum that might be too big for even a uint64_t.
So follow this suggestion and stop worrying ...
I have used, can recommend, and am very impressed with the multi-precision C++ library called "gmpxx.h".
I have used it for several interesting efforts, including code to generate a big fibonacci with no apparent effort. It is easy to use, and surprisingly quick, and I have found on the web examples of how to use.
Code snippet:
My output is a 2568 character (>1900 digits), comma delimited, big int value in < 20 ms.
2568 bytes
real 0m0.013s
user 0m0.004s
sys 0m0.000s
So how big is a uint64_t? I think the biggest Fib that fits in uint64_t is Fib(93).
The trick is that you don't have to store the entire sum of the vector. You can divide integers during iteration and store the remainder to add it to the next value.
This allows to create algorithm that is very memory efficient. I didn't make a benchmark but it should be OK for processors that have hardware division module.
Here a solution that shouldn't overflow as long as
el + vector.size()
fits intoACCU_T
for each element of the vector. It should be possible to remove this limitation if we use processor overflow flag.It doesn't use any floating point or big numbers. The variable of
accu
has a value ofsum(vec) % vec.size()
at the end of function.Yep, here's a version for GCC and Clang that shouldn't overflow for any unsigned integer.
(The exact constraint here is that
el + vector.size()
cannot be bigger that 2 times as much as ACCU_T can fit.)The go-to approach is just summing with a sufficiently wide integer type with
std::accumulate
:If this sum overflows (with 23 million ints, the average would have to be at least 4.01x1011 - which is to say, it won't overflow since that won't even fit in an
int32_t
... so you're way good, but on the off chance you get several orders of magnitude more numbers, or have wider aint
type), then you can instead use the common "online" algorithm for calculating the mean:This won't overflow, isn't very prone to loss of precision, but may be more expensive due to repeated extra divisions.