I've always thought it's the general wisdom that std::vector
is "implemented as an array," blah blah blah. Today I went down and tested it, and it seems to be not so:
Here's some test results:
UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds
That's about 3 - 4 times slower! Doesn't really justify for the "vector
may be slower for a few nanosecs" comments.
And the code I used:
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}
~TestTimer()
{
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel
{
Pixel()
{
}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}
unsigned char r, g, b;
};
void UseVector()
{
TestTimer t("UseVector");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
free(pixels);
}
}
int main()
{
TestTimer t1("The whole thing");
UseArray();
UseVector();
UseVectorPushBack();
return 0;
}
Am I doing it wrong or something? Or have I just busted this performance myth?
I'm using Release mode in Visual Studio 2005.
In Visual C++, #define _SECURE_SCL 0
reduces UseVector
by half (bringing it down to 4 seconds). This is really huge, IMO.
I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers). And actually the access time of an vector in continuous memory is faster than array. The following code shows the result of initialize and access vector and array.
The output is:
So the speed will be almost the same if you use it properly. (as others mentioned using reserve() or resize()).
std::vector
if you push data inside of it. It will reallocate stuff with more memory. The allocation and deallocation process consumes the most time.On the other hand, plain arrays just store the data inside of them and don't deallocate or reallocate themselves.
I Have to say I'm not an expert in C++. But to add some experiments results:
compile: gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp
machine:
OS:
Output:
Here the only thing I feel strange is that "UseFillConstructor" performance compared with "UseConstructor".
The code:
So the additional "value" provided slows down performance quite a lot, which I think is due to multiple call to copy constructor. But...
Compile:
Output:
So in this case, gcc optimization is very important but it can't help you much when a value is provided as default. This, is against my tuition actually. Hopefully it helps new programmer when choose which vector initialization format.
It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.
My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.
So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.
My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:
The results were as I suspected:
This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.
This means, that the following pseudo-operation order is happening during construction of the vector:
Which, due to the implicit copy constructor made by the compiler, is expanded to the following:
So the default
Pixel
remains un-initialised, while the rest are initialised with the defaultPixel
's un-initialised values.Compared to the alternative situation with
New[]
/Delete[]
:They are all left to their un-initialised values, and without the double iteration over the sequence.
Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.
And the results?
So in summary, if you're making hundreds of vectors very often: re-think your algorithm.
In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.
To be fair, you cannot compare a C++ implementation to a C implementation, as I would call your malloc version. malloc does not create objects - it only allocates raw memory. That you then treat that memory as objects without calling the constructor is poor C++ (possibly invalid - I'll leave that to the language lawyers).
That said, simply changing the malloc to
new Pixel[dimensions*dimensions]
and free todelete [] pixels
does not make much difference with the simple implementation of Pixel that you have. Here's the results on my box (E6600, 64-bit):But with a slight change, the tables turn:
Pixel.h
Pixel.cc
main.cc
Compiled this way:
we get very different results:
With a non-inlined constructor for Pixel, std::vector now beats a raw array.
It would appear that the complexity of allocation through std::vector and std:allocator is too much to be optimised as effectively as a simple
new Pixel[n]
. However, we can see that the problem is simply with the allocation not the vector access by tweaking a couple of the test functions to create the vector/array once by moving it outside the loop:and
We get these results now:
What we can learn from this is that std::vector is comparable to a raw array for access, but if you need to create and delete the vector/array many times, creating a complex object will be more time consuming that creating a simple array when the element's constructor is not inlined. I don't think that this is very surprising.
My laptop is Lenova G770 (4 GB RAM).
The OS is Windows 7 64-bit (the one with laptop)
Compiler is MinGW 4.6.1.
The IDE is Code::Blocks.
I test the source codes of the first post.
The results
O2 optimization
UseArray completed in 2.841 seconds
UseVector completed in 2.548 seconds
UseVectorPushBack completed in 11.95 seconds
The whole thing completed in 17.342 seconds
system pause
O3 optimization
UseArray completed in 1.452 seconds
UseVector completed in 2.514 seconds
UseVectorPushBack completed in 12.967 seconds
The whole thing completed in 16.937 seconds
It looks like the performance of vector is worse under O3 optimization.
If you change the loop to
The speed of array and vector under O2 and O3 are almost the same.