I'm observing unexpected behaviour (at least I cant find explanation for it) with GCC flag -flto
and jemalloc
/tcmalloc
. Once -flto
is used and I link with above libraries malloc/calloc and friends are not replaced by je/tc malloc
implementation, the glibc implementation is called. Once I remove -flto
flag, everything works as expected. I tried to use -fno-builtin
/-fno-builtin-*
with -flto
but still, it doesnt pick the je/tc malloc
implementation.
How the -flto
machinery works? Why the binary doesnt pick new implementation? How it even links with -fno-builtin
when it should fail on unresolved external for, say, printf
?
EDIT001:
GCC 7.3
Sample code
int main()
{
auto p = malloc(1024);
free(p);
return 0;
}
Compilation:
/usr/bin/c++ -O2 -g -DNDEBUG -flto -std=gnu++14 -o CMakeFiles/flto.dir/main.cpp.o -c /home/user/Development/CPPJunk/flto/main.cpp
Linkage:
/usr/bin/c++ -O2 -g -DNDEBUG -flto CMakeFiles/flto.dir/main.cpp.o -o flto -L/home/user/Development/jemalloc -Wl,-rpath,/home/user/Development/jemalloc -ljemalloc
EDIT002:
More suitable sample code
#include <cstdlib>
int main()
{
auto p = malloc(1024);
if (p) {
free(p);
}
auto p1 = new int;
if (p1) {
delete p1;
}
auto p2 = new int[32];
if (p2) {
delete[] p2;
}
return 0;
}
First, your sample code is wrong. Read carefully the C11 standard n1570. When you want to use the standard
malloc
, you should#include <stdlib.h>
.In C++11 (read n3337)
malloc
is frowned upon and should not be used (prefernew
). If you still want to usestd::malloc
in C++ you should#include <cstdlib>
(which, in GCC, is internally including<stdlib.h>
)Then your sample code is almost C code (once you replace
auto
withvoid*
), not C++. It could be optimized (once you include<stdlib.h>
), even without-flto
but with just-O3
, according to the as-if rule, to an emptymain
. (I've even wrote a public report, bismon-chariot-doc.pdf, which has a section §1.4.2 explaining in several pages how that optimization happens).To optimize around
malloc
andfree
, GCC uses some__attribute__(malloc)
function attribute in the declaration (inside<stdlib.h>
) ofmalloc
.LTO is explained in GCC internals §25.
It works by using some internal (GIMPLE-like and/or SSA-like) representation of the code both at "compile" and at "link" time (actually, the linking step becomes another compilation with whole-program optimization, so your code gets "compiled" twice in practice).
LTO always should (in practice) be used with some optimization flag (e.g.
-O2
or even-O3
) both at compile and at link time. So you should compile and link withg++ -flto -O2
(it has no practical sense to use-flto
without at least-O2
and the exact same optimization flags should be used at compile and at link time).More precisely
-flto
also embeds in the object files some internal (GIMPLE-like) representation of the source code, and that is also used "at link time" (notably for optimization and inlining happening again when "linking" your entire program, re-using its GIMPLE). Actually GCC contains some LTO front-end and compiler calledlto1
(in addition of the C++ front-end and compiler calledcc1plus
) andlto1
is (when you link withg++ -flto -O2
) used at link time to reprocess these GIMPLE representations.Probably,
libjemalloc
has its own headers, and might haveinline
(or inlinable) functions. Then you also need to use-flto -O2
when compiling that library from its source code (so that its Gimple is stored in the library)At last, the fact that the usual
malloc
gets called is independent of-flto
. It is a linker issue, not a compiler one. You could try to link-ljemalloc
statically (and then you'll better build that library also withgcc -flto -O2
; if you don't build it like that you won't get LTO optimizations acrossmalloc
calls).You could pass also
-v
to your compilation and linking commands to understand whatg++
is doing. You could even pass-Wl,--verbose
to ask theld
(started byg++
) to be verbose.Notice that LTO (and the internal representations that it is using) is compiler and version specific. The internal (Gimple & SSA) representation is slightly different between GCC 7 & GCC 8 (and in Clang it is very different, so of course incompatible). The dynamic linker ld-linux(8) does not know about LTO.
PS. You could install the
libjemalloc-dev
package and add#include <jemalloc/jemalloc.h>
in your code. See also jemalloc(3) man page. Probablylibjemalloc
could be configured or patched to define someje_malloc
symbol as a replacement formalloc
. Then it would be simpler (for LTO) to useje_malloc
in your code (to avoid conflict between severalmalloc
ELF symbols). To learn more about symbols in shared libraries, read Drepper's How to Write Shared Libraries paper. And of course you should expect LTO to change the behavior of linking!