I wanted to time a few functions' execution and I've written myself a helper:
using namespace std;
template<int N = 1, class Fun, class... Args>
void timeExec(string name, Fun fun, Args... args) {
auto start = chrono::steady_clock::now();
for(int i = 0; i < N; ++i) {
fun(args...);
}
auto end = chrono::steady_clock::now();
auto diff = end - start;
cout << name << ": "<< chrono::duration<double, milli>(diff).count() << " ms. << endl;
}
I figured that for timing member functions this way I'd have to use bind or lambda and I wanted to see which would impact the performance less, so I did:
const int TIMES = 10000;
timeExec<TIMES>("Bind evaluation", bind(&decltype(result)::eval, &result));
timeExec<1>("Lambda evaluation", [&]() {
for(int i = 0; i < TIMES; ++i) {
result.eval();
}
});
The results are:
Bind evaluation: 0.355158 ms.
Lambda evaluation: 0.014414 ms.
I don't know the internals, but I assume that lambda cannot be that better than bind. The only plausible explanation I can think of is the compiler optimizing-out subsequent function evaluations in the lambda's loop.
How would you explain it?
I've tested it. My results shows, that Lambda is actually faster than bind.
This is the code (please don't look at style):
Console results (Release with optimalization):
I've compiled it under Visual Studio Enterprise 2015 in the Release mode with Full Optimization (/ Ox) and in the Debug mode with disabled optimalization. Results confirm that lambda is faster than the bind on my laptop (Dell Inspiron 7537, Intel Core i7-4510U 2.00GHz, 8GB RAM).
Can anyone verify this on your computer?
That's quite a preconception.
Lambdas are tied into the compiler internals, so extra optimization opportunities may be found. Moreover, they're designed to avoid inefficiency.
However, there are probably no compiler optimization tricks happening here. The likely culprit is the argument to bind,
bind(&decltype(result)::eval, &result)
. You are passing a pointer-to-member-function (PTMF) and an object. Unlike the lambda type, the PTMF does not capture what function actually gets called; it only contains the function signature (parameter and return types). The slow loop is using an indirect branch function call, because the compiler failed to resolve the function pointer through constant propagation.If you rename the member
eval()
tooperator () ()
and get rid ofbind
, then the explicit object will essentially behave like the lambda and the performance difference should disappear.