C++ vector array operator high computational cost?

2019-04-23 05:05发布

I have always known that the rich abstractions of C++ come with a certain computational overhead but I was under the impression that this overhead would be close to negligible once the correct compiler optimisations were applied. I was curious as to what exactly the magnitude of this overhead would be, so I wrote a simple test to determine this. The test is a templated function which takes a container variable, assigns a value to each element in the container and then sums the values across the container in a separate loop. This process is repeated for a preset number of cycles.

What I found, to my considerable unease, was that the vector implementation took nearly 3 times the standard array implementation. After permuting through a vast selection of compiler optimizations without any success, I decided to bite the bullet and eyeball the assembly code directly to try and see what was causing the time penalty. I included some assembly directives which allowed me to pinpoint exactly where the array indexing operation occurred and examined the code in detail. What I found, to my complete confusion, was that the difference between the vector implementation and the array implementation was utterly insignificant. The assembly code can be found here.

This is the command I used to build the binary:

g++ -O3 vectorArrayOp.cpp -o vectorArrayOp

This is the command I used to build the assembly:

g++ -O3 -DTAGASM vectorArrayOp.cpp -S -o vectorArrayOp.s

This is the output I observe from running the binary:

gmurphy@interloper:Reference$ ./vectorArrayOp 
Duration 0.027678
Duration 0.090212

The results are unchanged when you include the computed value in the stdout stream, I removed them for clarity. My system specifications are as follows (I've seen the same results on my AMD too):

Linux 3.2.0-32-generic x86_64 GNU/Linux
Intel(R) Xeon(R) CPU X5550  @ 2.67GH
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

The code follows, I would appreciate if someone could provide me with some insight into why the timings are so different when the assembly is so similar.

#include <vector>
#include <iostream>
#include <sys/time.h>
#ifdef TAGASM
#define ASMTAG(X) asm(X)
#else
#define ASMTAG(X)
#endif 
enum { DataSize=1024, NumTests=(1<<16) } ;
struct ReturnValue {ReturnValue(float _d, int _t):d(_d), t(_t){} float d; int t;} ;
template <typename Container, typename Type>
ReturnValue runTest(Container &c, Type value)
{
    int tagValue(0);
    timeval startTime;
    gettimeofday(&startTime, NULL);
    for(int i=0; i<NumTests; i++)
    {
        for(int j=0; j<DataSize; j++)
        {
            ASMTAG("preassign");
            c [j] = value ;
            ASMTAG("postassign");
        }
        for(int j=0; j<DataSize; j++)
        {
            ASMTAG("preadd");
            tagValue += c [j] ;
            ASMTAG("postadd");
        }
    }
    timeval endTime;
    gettimeofday(&endTime, NULL);
    float duration((endTime.tv_sec-startTime.tv_sec)+
                   (endTime.tv_usec-startTime.tv_usec)/1000000.0);
    //tagValue is returned in case the optimising compiler might try to remove the loops
    return ReturnValue(duration, tagValue) ;
}
int main()
{
    int *arrayData = new int [DataSize];
    std::vector <int> vectorData(DataSize, 0) ;
    ReturnValue ad = runTest(arrayData, 1);
    ReturnValue vd = runTest(vectorData, 1);
    std::cout<<"Duration "<<ad.d<<std::endl;
    std::cout<<"Duration "<<vd.d<<std::endl;
    delete [] arrayData;
    return 0 ;
}

1条回答
看我几分像从前
2楼-- · 2019-04-23 05:19
% g++-4.4 -O3 vectorArrayOp.cpp -o vectorArrayOp
% ./vectorArrayOp
Duration 0.008581
Duration 0.008775
% g++-4.5 -O3 vectorArrayOp.cpp -o vectorArrayOp
% ./vectorArrayOp
Duration 0.008634
Duration 0.008588
% g++-4.6 -O3 vectorArrayOp.cpp -o vectorArrayOp
% ./vectorArrayOp
Duration 0.01731
Duration 0.081696
% g++-4.7 -O3 vectorArrayOp.cpp -o vectorArrayOp
% ./vectorArrayOp
Duration 0.008618
Duration 0.008612
% clang++ -O3 vectorArrayOp.cpp -o vectorArrayOp
% ./vectorArrayOp
Duration 0.066484
Duration 0.066435

Based on these results, this is probably a compiler specific performance regression in g++ 4.6.

查看更多
登录 后发表回答