In another thread I started a discussion about Vectors and Arrays, in which I was largely playing devil's advocate, to push buttons. However, during the course of this, I stumbled onto a test case that has me a little perplexed, and I would like to have a real discussion about it, over the "abuse" I'm getting for playing devil's advocate, starting a real discussion on that thread is now impossible. However, the particular example has me intrigued, and I cannot explain it to myself satisfactorily.
The discussion is about the general performance of Vector vs Arrays, ignoring dynamic elements. Ex: obviously continually using push_back() in a vector is going to slow it down. We're assuming that the vector and array are pre-populated with data. The example I presented, and subsequently modified by an individual in the thread, was the following:
#include <iostream>
#include <vector>
#include <type_traits>
using namespace std;
const int ARRAY_SIZE = 500000000;
// http://stackoverflow.com/a/15975738/500104
template <class T>
class no_init_allocator
{
public:
typedef T value_type;
no_init_allocator() noexcept {}
template <class U>
no_init_allocator(const no_init_allocator<U>&) noexcept {}
T* allocate(std::size_t n)
{return static_cast<T*>(::operator new(n * sizeof(T)));}
void deallocate(T* p, std::size_t) noexcept
{::operator delete(static_cast<void*>(p));}
template <class U>
void construct(U*) noexcept
{
// libstdc++ doesn't know 'is_trivially_default_constructible', still has the old names
static_assert(is_trivially_default_constructible<U>::value,
"This allocator can only be used with trivally default constructible types");
}
template <class U, class A0, class... Args>
void construct(U* up, A0&& a0, Args&&... args) noexcept
{
::new(up) U(std::forward<A0>(a0), std::forward<Args>(args)...);
}
};
int main() {
srand(5); //I use the same seed, we just need the random distribution.
vector<char, no_init_allocator<char>> charArray(ARRAY_SIZE);
//char* charArray = new char[ARRAY_SIZE];
for(int i = 0; i < ARRAY_SIZE; i++) {
charArray[i] = (char)((i%26) + 48) ;
}
for(int i = 0; i < ARRAY_SIZE; i++) {
charArray[i] = charArray[rand() % ARRAY_SIZE];
}
}
When I run this on my machine, I get the following terminal output. The first run is with the vector line uncommented, the second is with the array line uncommented. I used the highest level of optimization, to give the vector the best chance of success. Below are my results, the first two runs with the array line uncommented, the second two with the vector line.
//Array run # 1
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out
real 0m20.287s
user 0m20.068s
sys 0m0.175s
//Array run # 2
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out
real 0m21.504s
user 0m21.267s
sys 0m0.192s
//Vector run # 1
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out
real 0m28.513s
user 0m28.292s
sys 0m0.178s
//Vector run # 2
clang++ -std=c++11 -stdlib=libc++ -o3 some.cpp -o b.out && time ./b.out
real 0m28.607s
user 0m28.391s
sys 0m0.178s
That arrays outperform vectors does not surprise me, however, that the difference is on the order of 50% surprises me very much, I would expect that they would be negligible, and I feel like the nature of this test case me be obscuring the nature of the results. When you run this test on array sizes that are smaller, the performance differences dissipate dramatically.
My explanation:
The additional implementation instructions for vector are causing the vector instructions to align in memory poorly, perhaps even on this example, a split at a really bad point on 2 different "blocks". This is causing memory to jump back and forth between levels of cache vs data cache vs instruction cache more frequently than you would expect. I also suspect that the LLVM compiler may be exaggerating weaknesses, and optimizing poorly, due to some of the newer C++11 elements, though I have no reason for either of these explanations besides hypothesis and conjecture.
I am interested if A: that someone can replicate my results and B: if someone has a better explanation for how the computer is running this particular benchmark and why vector is so dramatically underperforming arrays in this instance.
My set up: http://www.newegg.com/Product/Product.aspx?Item=N82E16834100226
A simpler explanation: you're building with optimisations disabled. You want
-O3
, not-o3
.I don't have clang available to exactly reproduce your tests, but my results are as follows:
I can guarantee that LLVM does infact misoptimize std::vector (if you are in fact optimising at all), at least as of right now. It does not correctly inline many of the function calls involved. You will get better performance with GCC.