Is std::vector so much slower than plain arrays?

2018-12-31 15:34发布

I've always thought it's the general wisdom that std::vector is "implemented as an array," blah blah blah. Today I went down and tested it, and it seems to be not so:

Here's some test results:

UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds

That's about 3 - 4 times slower! Doesn't really justify for the "vector may be slower for a few nanosecs" comments.

And the code I used:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector()
{
    TestTimer t("UseVector");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        free(pixels);
    }
}

int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

Am I doing it wrong or something? Or have I just busted this performance myth?

I'm using Release mode in Visual Studio 2005.


In Visual C++, #define _SECURE_SCL 0 reduces UseVector by half (bringing it down to 4 seconds). This is really huge, IMO.

21条回答
ら面具成の殇う
2楼-- · 2018-12-31 15:47

I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers). And actually the access time of an vector in continuous memory is faster than array. The following code shows the result of initialize and access vector and array.

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <vector>
#define SIZE 20000
int main() {
    srand (time(NULL));
    vector<vector<int>> vector2d;
    vector2d.reserve(SIZE);
    int index(0);
    boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        vector2d.push_back(vector<int>(SIZE));
    }
    boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            vector2d[index][index]++;
        }
    }
    boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();
    boost::posix_time::time_duration msdiff = end - start_total;
    cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 


    int index(0);
    int** raw2d = nullptr;
    raw2d = new int*[SIZE];
    start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        raw2d[i] = new int[SIZE];
    }
    start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            raw2d[index][index]++;
        }
    }
    end = boost::posix_time::microsec_clock::local_time();
    msdiff = end - start_total;
    cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 
    for (int i = 0; i < SIZE; i++) {
        delete [] raw2d[i];
    }
    return 0;
}

The output is:

    Vector total time: 925milliseconds.
    Vector access time: 4milliseconds.
    Array total time: 30milliseconds.
    Array access time: 21milliseconds.

So the speed will be almost the same if you use it properly. (as others mentioned using reserve() or resize()).

查看更多
看风景的人
3楼-- · 2018-12-31 15:47

std::vector if you push data inside of it. It will reallocate stuff with more memory. The allocation and deallocation process consumes the most time.

On the other hand, plain arrays just store the data inside of them and don't deallocate or reallocate themselves.

NOTE: The information could be wrong. I got this info at somewhere in Quora.

查看更多
明月照影归
4楼-- · 2018-12-31 15:48

I Have to say I'm not an expert in C++. But to add some experiments results:

compile: gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp

machine:

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 

OS:

2.6.32-642.13.1.el6.x86_64

Output:

UseArray completed in 0.167821 seconds
UseVector completed in 0.134402 seconds
UseConstructor completed in 0.134806 seconds
UseFillConstructor completed in 1.00279 seconds
UseVectorPushBack completed in 6.6887 seconds
The whole thing completed in 8.12888 seconds

Here the only thing I feel strange is that "UseFillConstructor" performance compared with "UseConstructor".

The code:

void UseConstructor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension);
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}


void UseFillConstructor()
{
    TestTimer t("UseFillConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0));
    }
}

So the additional "value" provided slows down performance quite a lot, which I think is due to multiple call to copy constructor. But...

Compile:

gcc-6.2.0/bin/g++ -std=c++14 -O vector.cpp

Output:

UseArray completed in 1.02464 seconds
UseVector completed in 1.31056 seconds
UseConstructor completed in 1.47413 seconds
UseFillConstructor completed in 1.01555 seconds
UseVectorPushBack completed in 6.9597 seconds
The whole thing completed in 11.7851 seconds

So in this case, gcc optimization is very important but it can't help you much when a value is provided as default. This, is against my tuition actually. Hopefully it helps new programmer when choose which vector initialization format.

查看更多
初与友歌
5楼-- · 2018-12-31 15:49

It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.

const size_t dimension = 1000;

void UseArray() {
    TestTimer t("UseArray");
    for(size_t j = 0; j < dimension; ++j) {
        Pixel* pixels = new Pixel[dimension * dimension];
        for(size_t i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
        delete[] pixels;
    }
}

void UseVector() {
    TestTimer t("UseVector");
    for(size_t j = 0; j < dimension; ++j) {
        std::vector<Pixel> pixels(dimension * dimension);
        for(size_t i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
    }
}

int main() {
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();

    return 0;
}

My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.

UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds

So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.

My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:

class Tester {
public:
    static int count;
    static int count2;
    Tester() { count++; }
    Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;

int main() {
    std::vector<Tester> myvec(300);
    printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);

    return 0;
}

The results were as I suspected:

Default Constructed: 1
Copy Constructed: 300

This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.

This means, that the following pseudo-operation order is happening during construction of the vector:

Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;

Which, due to the implicit copy constructor made by the compiler, is expanded to the following:

Pixel pixel;
for (auto i = 0; i < N; ++i) {
    vector[i].r = pixel.r;
    vector[i].g = pixel.g;
    vector[i].b = pixel.b;
}

So the default Pixel remains un-initialised, while the rest are initialised with the default Pixel's un-initialised values.

Compared to the alternative situation with New[]/Delete[]:

int main() {
    Tester* myvec = new Tester[300];

    printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);

    delete[] myvec;

    return 0;
}

Default Constructed: 300
Copy Constructed: 0

They are all left to their un-initialised values, and without the double iteration over the sequence.

Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.

Pixel(const Pixel&) {}

And the results?

UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds

So in summary, if you're making hundreds of vectors very often: re-think your algorithm.

In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.

查看更多
心情的温度
6楼-- · 2018-12-31 15:51

To be fair, you cannot compare a C++ implementation to a C implementation, as I would call your malloc version. malloc does not create objects - it only allocates raw memory. That you then treat that memory as objects without calling the constructor is poor C++ (possibly invalid - I'll leave that to the language lawyers).

That said, simply changing the malloc to new Pixel[dimensions*dimensions] and free to delete [] pixels does not make much difference with the simple implementation of Pixel that you have. Here's the results on my box (E6600, 64-bit):

UseArray completed in 0.269 seconds
UseVector completed in 1.665 seconds
UseVectorPushBack completed in 7.309 seconds
The whole thing completed in 9.244 seconds

But with a slight change, the tables turn:

Pixel.h

struct Pixel
{
    Pixel();
    Pixel(unsigned char r, unsigned char g, unsigned char b);

    unsigned char r, g, b;
};

Pixel.cc

#include "Pixel.h"

Pixel::Pixel() {}
Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) 
  : r(r), g(g), b(b) {}

main.cc

#include "Pixel.h"
[rest of test harness without class Pixel]
[UseArray now uses new/delete not malloc/free]

Compiled this way:

$ g++ -O3 -c -o Pixel.o Pixel.cc
$ g++ -O3 -c -o main.o main.cc
$ g++ -o main main.o Pixel.o

we get very different results:

UseArray completed in 2.78 seconds
UseVector completed in 1.651 seconds
UseVectorPushBack completed in 7.826 seconds
The whole thing completed in 12.258 seconds

With a non-inlined constructor for Pixel, std::vector now beats a raw array.

It would appear that the complexity of allocation through std::vector and std:allocator is too much to be optimised as effectively as a simple new Pixel[n]. However, we can see that the problem is simply with the allocation not the vector access by tweaking a couple of the test functions to create the vector/array once by moving it outside the loop:

void UseVector()
{
    TestTimer t("UseVector");

    int dimension = 999;
    std::vector<Pixel> pixels;
    pixels.resize(dimension * dimension);

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

and

void UseArray()
{
    TestTimer t("UseArray");

    int dimension = 999;
    Pixel * pixels = new Pixel[dimension * dimension];

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
    delete [] pixels;
}

We get these results now:

UseArray completed in 0.254 seconds
UseVector completed in 0.249 seconds
UseVectorPushBack completed in 7.298 seconds
The whole thing completed in 7.802 seconds

What we can learn from this is that std::vector is comparable to a raw array for access, but if you need to create and delete the vector/array many times, creating a complex object will be more time consuming that creating a simple array when the element's constructor is not inlined. I don't think that this is very surprising.

查看更多
公子世无双
7楼-- · 2018-12-31 15:52

My laptop is Lenova G770 (4 GB RAM).

The OS is Windows 7 64-bit (the one with laptop)

Compiler is MinGW 4.6.1.

The IDE is Code::Blocks.

I test the source codes of the first post.

The results

O2 optimization

UseArray completed in 2.841 seconds

UseVector completed in 2.548 seconds

UseVectorPushBack completed in 11.95 seconds

The whole thing completed in 17.342 seconds

system pause

O3 optimization

UseArray completed in 1.452 seconds

UseVector completed in 2.514 seconds

UseVectorPushBack completed in 12.967 seconds

The whole thing completed in 16.937 seconds

It looks like the performance of vector is worse under O3 optimization.

If you change the loop to

    pixels[i].r = i;
    pixels[i].g = i;
    pixels[i].b = i;

The speed of array and vector under O2 and O3 are almost the same.

查看更多
登录 后发表回答