Dear StackOverflowers,
I got a simple piece of code which I am compiling on Microsoft Visual Studio C++ 2012:
int add(int x, int y)
{
return x + y;
}
typedef int (*func_t)(int, int);
class A
{
public:
const static func_t FP;
};
const func_t A::FP = &add;
int main()
{
int x = 3;
int y = 2;
int z = A::FP(x, y);
return 0;
}
The compiler generates the following code:
int main()
{
000000013FBA2430 sub rsp,28h
int x = 3;
int y = 2;
int z = A::FP(x, y);
000000013FBA2434 mov edx,2
000000013FBA2439 lea ecx,[rdx+1]
000000013FBA243C call qword ptr [A::FP (013FBA45C0h)]
return 0;
000000013FBA2442 xor eax,eax
}
I compiled this on the 'Full optimisation' (/Obx flag) and 'Any Suitable' for Inline function Expansion. (/Ob2 flag)
I was wondering why the compiler doesn't inline this call expecially since it's const. Does any of you have an idea why it is not inlined and if it's possible to make the compiler inline it?
Christian
EDIT: I am running some tests now and MSVC fails to inline the function pointers too when:
-I move the const pointer out of the class and make it global.
-I move the const pointer out of the class and make it local in main.
-I make the pointer non-const and move it in locally.
-When I make the return type void and giving it no parameters
I kind start believing Microsoft Visual Studio cannot inline function pointers at all...
The problem isn't with inlining, which the compiler does at every opportunity. The problem is that Visual C++ doesn't seem to realize that the pointer variable is actually a compile-time constant.
Test-case:
// function_pointer_resolution.cpp : Defines the entry point for the console application.
//
extern void show_int( int );
extern "C" typedef int binary_int_func( int, int );
extern "C" binary_int_func sum;
extern "C" binary_int_func* const sum_ptr = sum;
inline int call( binary_int_func* binary, int a, int b ) { return (*binary)(a, b); }
template< binary_int_func* binary >
inline int callt( int a, int b ) { return (*binary)(a, b); }
int main( void )
{
show_int( sum(1, 2) );
show_int( call(&sum, 3, 4) );
show_int( callt<&sum>(5, 6) );
show_int( (*sum_ptr)(1, 7) );
show_int( call(sum_ptr, 3, 8) );
// show_int( callt<sum_ptr>(5, 9) );
return 0;
}
// sum.cpp
extern "C" int sum( int x, int y )
{
return x + y;
}
// show_int.cpp
#include <iostream>
void show_int( int n )
{
std::cout << n << std::endl;
}
The functions are separated into multiple compilation units to give better control over inlining. Specifically, I don't want show_int
inlined, since it makes the assembly code messy.
The first whiff of trouble is that valid code (the commented line) is rejected by Visual C++. G++ has no problem with it, but Visual C++ complains "expected compile-time constant expression". This is actually a good predictor of all future behavior.
With optimization enabled and normal compilation semantics (no cross-module inlining), the compiler generates:
_main PROC ; COMDAT
; 18 : show_int( sum(1, 2) );
push 2
push 1
call _sum
push eax
call ?show_int@@YAXH@Z ; show_int
; 19 : show_int( call(&sum, 3, 4) );
push 4
push 3
call _sum
push eax
call ?show_int@@YAXH@Z ; show_int
; 20 : show_int( callt<&sum>(5, 6) );
push 6
push 5
call _sum
push eax
call ?show_int@@YAXH@Z ; show_int
; 21 : show_int( (*sum_ptr)(1, 7) );
push 7
push 1
call DWORD PTR _sum_ptr
push eax
call ?show_int@@YAXH@Z ; show_int
; 22 : show_int( call(sum_ptr, 3, 8) );
push 8
push 3
call DWORD PTR _sum_ptr
push eax
call ?show_int@@YAXH@Z ; show_int
add esp, 60 ; 0000003cH
; 23 : //show_int( callt<sum_ptr>(5, 9) );
; 24 : return 0;
xor eax, eax
; 25 : }
ret 0
_main ENDP
There's already a huge difference between using sum_ptr
and not using sum_ptr
. Statements using sum_ptr
generate a indirect function call call DWORD PTR _sum_ptr
while all other statements generate a direct function call call _sum
, even when the source code used a function pointer.
If we now enable inlining by compiling function_pointer_resolution.cpp and sum.cpp with /GL
and linking with /LTCG
, we find that the compiler inlines all direct calls. Indirect calls stay as-is.
_main PROC ; COMDAT
; 18 : show_int( sum(1, 2) );
push 3
call ?show_int@@YAXH@Z ; show_int
; 19 : show_int( call(&sum, 3, 4) );
push 7
call ?show_int@@YAXH@Z ; show_int
; 20 : show_int( callt<&sum>(5, 6) );
push 11 ; 0000000bH
call ?show_int@@YAXH@Z ; show_int
; 21 : show_int( (*sum_ptr)(1, 7) );
push 7
push 1
call DWORD PTR _sum_ptr
push eax
call ?show_int@@YAXH@Z ; show_int
; 22 : show_int( call(sum_ptr, 3, 8) );
push 8
push 3
call DWORD PTR _sum_ptr
push eax
call ?show_int@@YAXH@Z ; show_int
add esp, 36 ; 00000024H
; 23 : //show_int( callt<sum_ptr>(5, 9) );
; 24 : return 0;
xor eax, eax
; 25 : }
ret 0
_main ENDP
Bottom-line: Yes, the compiler does inline calls made through a compile-time constant function pointer, as long as that function pointer is not read from a variable. This use of a function pointer got optimized:
call(&sum, 3, 4);
but this did not:
(*sum_ptr)(1, 7);
All tests run with Visual C++ 2010 Service Pack 1, compiling for x86, hosted on x64.
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
I think that you're right in this conclusion: "... cannot inline function pointers at all".
This very simple example also breaks optimization:
static inline
int add(int x, int y)
{
return x + y;
}
int main()
{
int x = 3;
int y = 2;
auto q = add;
int z = q(x, y);
return z;
}
Your sample is even more complex for the compiler, so it is not surprising.
You can try __forceinline
. Nobody is going to be able to tell you exactly why it isn't inlined. Common sense says to me that it should be, however. /O2 should favor code speed over code size (inlining)... Strange.
This is not a real answer, but a "maybe workaround" one:
STL from Microsoft once mentioned that lambdas are more easily inlineable than f ptrs so you could try that.
As a trivia Bjarne often mentions that sort is faster thatn qsort because qsort takes function ptr, but like other people have noted gcc has no problems inlining them... so maybe Bjarne should try gcc :P