Get the average value from a vector of Integers

2020-07-09 06:20发布

I have been unable to find a way of getting the average value from a vector of integers in C++.

I can't possibly start adding all of the values because I could exceed the maximum integer accepted value.

How can I calculate this efficiently and quickly ? Are there any standard libraries in the C++ language to do that ?

标签: c++ vector
3条回答
叼着烟拽天下
2楼-- · 2020-07-09 06:59

I have been unable to find a way of getting the average value from a vector of integers in C++. I can't possibly start adding all of the values because I could exceed the maximum integer accepted value. How can I calculate this efficiently and quickly ? Are there any standard libraries in the C++.

A lot of discussion of finding a sum that might be too big for even a uint64_t.

So follow this suggestion and stop worrying ...

I have used, can recommend, and am very impressed with the multi-precision C++ library called "gmpxx.h".

I have used it for several interesting efforts, including code to generate a big fibonacci with no apparent effort. It is easy to use, and surprisingly quick, and I have found on the web examples of how to use.

Code snippet:

mpz_class  n;   // a multi-precision integer, 
n = 1;          // easy initialize
size_t F = 1000; 
for (size_t i=1; i<=F; ++i)
   n = n * i;
// show
std::string Fstr = digiComma(n);  // inserts comma's
std::cout << "\n" << F << "! = " << Fstr
          << "\n" << digitCnt(Fstr) << " bytes " <<  std::endl;

My output is a 2568 character (>1900 digits), comma delimited, big int value in < 20 ms.

1000! = 402,387,260,077,093,773,543,702,433,923,003,985,719,374,864,210,714,632,543,799, 910,429,938,512,398,629,020,592,044,208,486,969,404,800,479,988,610,197,196,058, 631,666,872,994,808,558,901,323,829,669,944,590,997,424,504,087,073,759,918,823, 627,727,188,732,519,779,505,950,995,276,120,874,975,462,497,043,601,418,278,094, 646,496,291,056,393,887,437,886,487,337,119,181,045,825,783,647,849,977,012,476, 632,889,835,955,735,432,513,185,323,958,463,075,557,409,114,262,417,474,349,347, 553,428,646,576,611,667,797,396,668,820,291,207,379,143,853,719,588,249,808,126, 867,838,374,559,731,746,136,085,379,534,524,221,586,593,201,928,090,878,297,308, 431,392,844,403,281,231,558,611,036,976,801,357,304,216,168,747,609,675,871,348, 312,025,478,589,320,767,169,132,448,426,236,131,412,508,780,208,000,261,683,151, 027,341,827,977,704,784,635,868,170,164,365,024,153,691,398,281,264,810,213,092, 761,244,896,359,928,705,114,964,975,419,909,342,221,566,832,572,080,821,333,186, 116,811,553,615,836,546,984,046,708,975,602,900,950,537,616,475,847,728,421,889, 679,646,244,945,160,765,353,408,198,901,385,442,487,984,959,953,319,101,723,355, 556,602,139,450,399,736,280,750,137,837,615,307,127,761,926,849,034,352,625,200, 015,888,535,147,331,611,702,103,968,175,921,510,907,788,019,393,178,114,194,545, 257,223,865,541,461,062,892,187,960,223,838,971,476,088,506,276,862,967,146,674, 697,562,911,234,082,439,208,160,153,780,889,893,964,518,263,243,671,616,762,179, 168,909,779,911,903,754,031,274,622,289,988,005,195,444,414,282,012,187,361,745, 992,642,956,581,746,628,302,955,570,299,024,324,153,181,617,210,465,832,036,786, 906,117,260,158,783,520,751,516,284,225,540,265,170,483,304,226,143,974,286,933, 061,690,897,968,482,590,125,458,327,168,226,458,066,526,769,958,652,682,272,807, 075,781,391,858,178,889,652,208,164,348,344,825,993,266,043,367,660,176,999,612, 831,860,788,386,150,279,465,955,131,156,552,036,093,988,180,612,138,558,600,301, 435,694,527,224,206,344,631,797,460,594,682,573,103,790,084,024,432,438,465,657, 245,014,402,821,885,252,470,935,190,620,929,023,136,493,273,497,565,513,958,720, 559,654,228,749,774,011,413,346,962,715,422,845,862,377,387,538,230,483,865,688, 976,461,927,383,814,900,140,767,310,446,640,259,899,490,222,221,765,904,339,901, 886,018,566,526,485,061,799,702,356,193,897,017,860,040,811,889,729,918,311,021, 171,229,845,901,641,921,068,884,387,121,855,646,124,960,798,722,908,519,296,819, 372,388,642,614,839,657,382,291,123,125,024,186,649,353,143,970,137,428,531,926, 649,875,337,218,940,694,281,434,118,520,158,014,123,344,828,015,051,399,694,290, 153,483,077,644,569,099,073,152,433,278,288,269,864,602,789,864,321,139,083,506, 217,095,002,597,389,863,554,277,196,742,822,248,757,586,765,752,344,220,207,573, 630,569,498,825,087,968,928,162,753,848,863,396,909,959,826,280,956,121,450,994, 871,701,244,516,461,260,379,029,309,120,889,086,942,028,510,640,182,154,399,457, 156,805,941,872,748,998,094,254,742,173,582,401,063,677,404,595,741,785,160,829, 230,135,358,081,840,096,996,372,524,230,560,855,903,700,624,271,243,416,909,004, 153,690,105,933,983,835,777,939,410,970,027,753,472,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

2568 bytes

real 0m0.013s

user 0m0.004s

sys 0m0.000s

So how big is a uint64_t? I think the biggest Fib that fits in uint64_t is Fib(93).

查看更多
祖国的老花朵
3楼-- · 2020-07-09 06:59

The trick is that you don't have to store the entire sum of the vector. You can divide integers during iteration and store the remainder to add it to the next value.

This allows to create algorithm that is very memory efficient. I didn't make a benchmark but it should be OK for processors that have hardware division module.

Here a solution that shouldn't overflow as long as el + vector.size() fits into ACCU_T for each element of the vector. It should be possible to remove this limitation if we use processor overflow flag.

template<typename T, typename ACCU_T = uintmax_t>
T vec_average(const std::vector<T> &vec)
{
    const ACCU_T size = (ACCU_T)vec.size();
    T avg = 0;
    ACCU_T accu = 0;
    for (const T &el : vec)
    {
        accu += (ACCU_T)el;
        avg += (T)(accu / size);
        accu %= size;
    }
    return avg;
}

It doesn't use any floating point or big numbers. The variable of accu has a value of sum(vec) % vec.size() at the end of function.


Yep, here's a version for GCC and Clang that shouldn't overflow for any unsigned integer.

(The exact constraint here is that el + vector.size() cannot be bigger that 2 times as much as ACCU_T can fit.)

template<typename T, typename ACCU_T = uintmax_t>
T vec_average(const std::vector<T> &vec)
{
    const ACCU_T size = (ACCU_T)vec.size();
    const T overflowAvg = (T)((ACCU_T(-1)) / size);
    const ACCU_T overflowAccu = overflowAvg * size;
    T avg = 0;
    ACCU_T accu = 0;
    for (const T &el : vec)
    {
        if (__builtin_add_overflow(accu, (ACCU_T)el, &accu))
        {
            avg += overflowAvg;
            accu -= overflowAccu;
        }
        avg += (T)(accu / size);
        accu %= size;
    }
    return avg;
}
查看更多
爷的心禁止访问
4楼-- · 2020-07-09 07:04

The go-to approach is just summing with a sufficiently wide integer type with std::accumulate:

double avg1(std::vector<int> const& v) {
    return 1.0 * std::accumulate(v.begin(), v.end(), 0LL) / v.size();
}

If this sum overflows (with 23 million ints, the average would have to be at least 4.01x1011 - which is to say, it won't overflow since that won't even fit in an int32_t... so you're way good, but on the off chance you get several orders of magnitude more numbers, or have wider a int type), then you can instead use the common "online" algorithm for calculating the mean:

double avg2(std::vector<int> const& v) {
    int n = 0;
    double mean = 0.0;
    for (auto x : v) {
        double delta = x - mean;
        mean += delta/++n;
    }
    return mean;
}

This won't overflow, isn't very prone to loss of precision, but may be more expensive due to repeated extra divisions.

查看更多
登录 后发表回答