Here is the code for testing.
Tuple test:
using namespace std;
int main(){
vector<tuple<int,int>> v;
for (int var = 0; var < 100000000; ++var) {
v.push_back(make_tuple(var, var));
}
}
Pair test:
#include <vector>
using namespace std;
int main(){
vector<pair<int,int>> v;
for (int var = 0; var < 100000000; ++var) {
v.push_back(make_pair(var, var));
}
}
I did the time measurement via Linux time command. The results are:
| | -O0 | -O2 |
|:------|:-------:|:--------:|
| Pair | 8.9 s | 1.60 s |
| Tuple | 19.8 s | 1.96 s |
I am wondering, why is such a big difference between those two data structures in O0, as they should be very similar. There is just a small difference in 02.
Why is the difference in O0 so big, and why is there any difference at all?
EDIT:
The code with v.resize()
Pair:
#include <vector>
using namespace std;
int main(){
vector<pair<int,int>> v;
v.resize(100000000);
for (int var = 0; var < 100000000; ++var) {
v[var] = make_pair(var, var);
}
}
Tuple:
#include<tuple>
#include<vector>
using namespace std;
int main(){
vector<tuple<int,int>> v;
v.resize(100000000);
for (int var = 0; var < 100000000; ++var) {
v[var] = make_tuple(var, var);
}
}
Results:
| | -O0 | -O2 |
|:------|:-------:|:--------:|
| Pair | 5.01 s | 0.77 s |
| Tuple | 10.6 s | 0.87 s |
EDIT:
My system
g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
GLIBCXX_3.4.19
You are missing some crucial information: What compiler do you use? What do you use to measure the performance of the microbenchmark? What standard library implementation do you use?
My system:
Anyhow, I ran your examples, but reserved the proper size of the vectors first to get rid of the memory allocation overhead. With that, I funnily observe the opposite something interesting - the reverse of what you see:
versus:
as you can see, in my case the reason are the much higher number of stalled cycles, both in the frontend as well as in the backend.
Now where does this come from? I bet it comes down to some failed inlining, similar to what is explained here: std::vector performance regression when enabling C++11
Indeed, enabling
-flto
equalizes the results for me:and for tuple:
So remember,
-flto
is your friend and failed inlining can have extreme results on heavily templated code. Useperf stat
to find out what's happening.milianw didn't address the
-O0
vs.-O2
, so I'd like to add explanation for that.It is fully expected that
std::tuple
will be slower thanstd::pair
when not optimized, because it is more complicated object. A pair has exactly two members, so its methods are straightforward to define. But tuple has arbitrary number of members and the only way to iterate over template argument list is with recursion. Therefore most functions for tuple handle one member and then recurse to handle the rest, so for 2-tuple you have twice as many function calls.Now when they are optimized, the compiler will inline that recursion and there should not be significant difference. Which the tests clearly confirm. That applies to heavily templated stuff in general. Templates can be written to provide abstraction with no or very little runtime overhead, but that relies on optimizations to inline all the trivial functions.