Toying around with clang
, I compiled a C program containing this line:
printf("%s\n", argv[0]);
When compiling without optimization, the assembly output called printf
after setting up the registers:
movq (%rcx), %rsi
movq %rax, %rdi
movb $0, %al
callq _printf
I tried compiling with clang -O2
. The printf
call was replaced to a puts
call:
movq (%rsi), %rdi
callq _puts
While this makes perfect sense in this case, it raises two questions:
- How often does function call substitution happen in optimized compilation? Is this frequent or is stdio an exception?
- Could I write compiler optimizations for my own libraries? How would I do that?
- How often does function call substitution happen in optimized compilation? Is this frequent or is stdio an exception?
The optimization that replaces printf
with puts
in LLVM is in the class LibCallSimplifier
. You can see the header file in llvm/include/llvm/Transforms/Utils/SimplifyLibCalls.h and the implementation in llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp. Looking at the files will give an example of some of the other library call replacement optimizations that are done (the header file is probably easier to start with). And of course there are other many other optimizations that LLVM does, you can get an idea of some of them by looking at the list of LLVM passes.
- Could I write compiler optimizations for my own libraries? How would I do that?
Yes you could. LLVM is very modular and performs transformation on the IR in a series of passes. So if you wanted to add a custom pass yourself for your own library you could do so (although it is still a fair amount of work to understand how the LLVM compiler flow works). A good starting point is the document: Writing an LLVM Pass.
This sort of optimization depends on the compiler knowing that a function named printf
can only be the printf
function as defined by the C standard. If program defines printf
to mean something else then the program is invoking undefined behaviour. This lets the compiler substitute a call to puts
in cases where it would work "as if" the standard printf
function was being called. It doesn't have worry about it working "as if" a user defined printf
function was called. So these kind of function substitution optimizations are pretty much limited to functions defined in the C or C++ standards. (Maybe other standards as well if the compiler somehow knows that a given standard is in force.)
Short of modifying the source code of the compiler yourself there's no way tell the compiler that these kind of function substitutions are possible with your own functions. However, with limitations, you can do something similar with inline functions. For example you could implement something similar to the printf
/puts
optimization with something like this:
inline int myprintf(char const *fmt, char const *arg) {
if (strcmp(fmt, "%s\n") == 0) {
return myputs(args);
}
return _myprintf_impl(fmt, arg)
}
With optimization turned on the compiler can choose at compile time which function to call based on the fmt
parameter, but only if it it can determine it's a constant string. If it can't, or optimization isn't enabled, then the compiler has to emit code that checks it on each call and that could easily turn this into a pessimization. Note that this optimization is dependent on the compiler knowing how strcmp
works and removing the call entirely, and so is an example of another library function call substitution the compiler can make.
You can improve on this with GCC's __builtin_constant_p
function:
inline int myprintf(char const *fmt, char const *arg) {
if (__builtin_constant_p(fmt[0])
&& strcmp(fmt, "%s\n") == 0) {
return myputs(arg);
}
return _myprintf_impl(fmt, arg);
}
Under GCC this results in code that never checks the format string a run time. If can determine at compile time that fmt
is "%s\n"
then it generates code that calls myputs
unconditionally, otherwise it generates code that calls _myprintf_impl
unconditionally. So with optimization enabled this function is never a pessimization. Unfortunately while clang supports the __builtin_constant_p
function my version of clang always generates code that calls _myprintf_impl
unconditionally.
puts is a much smaller function than printf, executables are typically half the size. printf is only needed when converting numbers to strings for printing, you can do this using itoa()