I have got here two programs with me, both are doing exactly the same task. They are just setting an boolean array / vector to the value true. The program using vector
takes 27 seconds to run whereas the program involving array with 5 times greater size takes less than 1 s. I would like to know the exact reason as to why there is such a major difference ? Are vectors
really that inefficient ?
Program using vectors
#include <iostream>
#include <vector>
#include <ctime>
using namespace std;
int main(){
const int size = 2000;
time_t start, end;
time(&start);
vector<bool> v(size);
for(int i = 0; i < size; i++){
for(int j = 0; j < size; j++){
v[i] = true;
}
}
time(&end);
cout<<difftime(end, start)<<" seconds."<<endl;
}
Runtime - 27 seconds
Program using Array
#include <iostream>
#include <ctime>
using namespace std;
int main(){
const int size = 10000; // 5 times more size
time_t start, end;
time(&start);
bool v[size];
for(int i = 0; i < size; i++){
for(int j = 0; j < size; j++){
v[i] = true;
}
}
time(&end);
cout<<difftime(end, start)<<" seconds."<<endl;
}
Runtime - < 1 seconds
Platform - Visual Studio 2008
OS - Windows Vista 32 bit SP 1
Processor Intel(R) Pentium(R) Dual CPU T2370 @ 1.73GHz
Memory (RAM) 1.00 GB
Thanks
Amare
You're using std::vector of bool and that's not what you think it is!
vector of bool is a bastard child template specialization that should never have existed and actually stores 1 bool in each bit. Accesses into it are more complicated due to masking and shifting logic, so will definitely be somewhat slower.
Click here for some info on vector of bool.
Also, you may be running an unoptimized build (almost certainly given the times you listed, 27 seconds is outrageous for 4 million iterations). The Standard Template Library relies very heavily on the optimizer to do things like inline function calls and elide temporaries. Lack of this optimization causes an especially heavy performance degradation for vector of bool because it has to return a proxy object when you index into it, because you can't take the address of a bit, so operator [] can't return a reference.
Click here for more info on proxied containers (The last half is about vector of bool)
In addition many STL implementations have helpful debugging bits that are not part of the standard which help you catch bugs, but really drag performance down. You'll want to make sure those are disabled in your optimized build.
Once you turn on the optimizer, have the right settings (i.e. no STL debugging turned on), and are actually testing the same thing in both loops, you will see almost no difference.
I had to make your loop much larger to test on my machine, but here's two builds of your vector of bool loop on my machine, showing the impact of optimizer flags on STL code
$ g++ main.cpp
$ ./a.out
17 seconds.
$ g++ -O2 main.cpp
$ ./a.out
1 seconds.
Have a look at this
Using arrays or std::vectors in C++, what’s the performance gap?
std::vector<bool>
is optimized for memory consumption rather than performance.
You may trick it by using std::vector<int>
. You should not have performance drawbacks then.
Other answers are very good, but you could easily answer it yourself by this method.
ADDED: In response to comments, let me show you what I mean. I'm running VC On Windows, but this works on any language/OS. I took your first program and increased size to 20000 so it would run long enough. Then while it was running, I took several stackshots. They all look like this:
std::vector<bool,std::allocator<bool> >::begin() line 93 + 25 bytes
std::vector<bool,std::allocator<bool> >::operator[]() line 132 + 37 bytes
main() line 24 + 12 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()
So what that says is that it is spending essentially all of it's time in the indexing operation on line 24, and the reason it's spending that time is that the []
operator is calling the begin
operator. In more detail:
main() line 24 + 12 bytes
is this code:
for(int j = 0; j < size; j++){
==> v[i] = true;
}
that calls:
std::vector<bool,std::allocator<bool> >::operator[]() line 132 + 37 bytes
which is this code (that I reformatted a little bit):
reference operator[](size_type _P){
==> return (*(begin() + _P));
}
that calls:
std::vector<bool,std::allocator<bool> >::begin() line 93 + 25 bytes
which is doing this (in more detail):
92: iterator begin()
93: {return (_First); }
00402890 push ebp
00402891 mov ebp,esp
00402893 sub esp,44h
00402896 push ebx
00402897 push esi
00402898 push edi
00402899 push ecx
0040289A lea edi,[ebp-44h]
0040289D mov ecx,11h
004028A2 mov eax,0CCCCCCCCh
004028A7 rep stos dword ptr [edi]
004028A9 pop ecx <===============
004028AA mov dword ptr [ebp-4],ecx
004028AD mov eax,dword ptr [ebp-4]
004028B0 mov eax,dword ptr [eax+4]
004028B3 pop edi
004028B4 pop esi
004028B5 pop ebx
004028B6 mov esp,ebp
004028B8 pop ebp
004028B9 ret
What it is doing is writing 68 bytes of 0xCC
on the stack (for some debug reason) as part of getting the begin
address of the vector, as part of computing the address of v[i]
, prior to doing the assignment.
The fraction of time it spends doing this is near 100%, because it was doing it on every one of the several samples that were taken. Could you have guessed that that's what it was spending nearly all of its time doing? I couldn't have.
This is, of course, a Debug build. If you switch to a Release build, but turn on debugging information, all these functions get inlined and optimized away, so it goes like 30 times faster, and again the stackshots tell exactly what it's doing.
So - people can tell you what it might be doing, but this shows how to find out for yourself what it is really doing.
On your environment it will undoubtedly be different.