I just found this answer from @tony-d with a bench code to test virtual function call overhead. I checked is benchmark using g++
:
$ g++ -O2 -o vdt vdt.cpp -lrt
$ ./vdt
virtual dispatch: 150000000 0.128562
switched: 150000000 0.0803207
overheads: 150000000 0.0543323
...
I got better performance that his (ratio is about 2), but then I checked with clang
:
$ clang++-3.7 -O2 -o vdt vdt.cpp -lrt
$ ./vdt
virtual dispatch: 150000000 0.462368
switched: 150000000 0.0569544
overheads: 150000000 0.0509332
...
Now the ratio goes up to about 70!
I then noticed the -lrt
command line argument, and after a bit of googling about librt
I tried without it for g++
and clang
:
$ g++ -O2 -o vdt vdt.cpp
$ ./vdt
virtual dispatch: 150000000 0.4661
switched: 150000000 0.0815865
overheads: 150000000 0.0543611
...
$ clang++-3.7 -O2 -o vdt vdt.cpp
$ ./vdt
virtual dispatch: 150000000 0.155901
switched: 150000000 0.0568319
overheads: 150000000 0.0492521
...
As you can see, the performance are swaped.
From what I found about librt
, it is needed for clock_gettime
and other related time computation (maybe I am wrong, correct me in this case!) but the code compiles fine without -lrt
, and the time seems correct from what I see.
Why does linking / not linking librt
affects that code so much?
Informations about my system and compilers:
$ g++ --version
g++-5 (Ubuntu 5.3.0-3ubuntu1~14.04) 5.3.0 20151204
Copyright (C) 2015 Free Software Foundation, Inc.
$ clang++-3.7 --version
Debian clang version 3.7.1-svn254351-1~exp1 (branches/release_37) (based on LLVM 3.7.1)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ uname -a
Linux ****** 3.13.0-86-generic #130-Ubuntu SMP Mon Apr 18 18:27:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
I would guess that is connected with oprimizer (if -lrt is specified, because of trying to link with the library the optimizer has more data and can optimize differently).
As for the differences, with my g++ (4.8.4) I have the same results with and without -lrt, but clang (3.4.-lubuntu3) there is a difference. I tried to run this through perftools statistics with the folloving results:
What I can see that there is some difference in branch prediction (branch-misses) in clang (that again to the optimizer).