Visual C++ ~ Not inlining simple const function po

2020-03-12 05:31发布

问题:

Dear StackOverflowers,

I got a simple piece of code which I am compiling on Microsoft Visual Studio C++ 2012:

int add(int x, int y)
{
    return x + y;
}

typedef int (*func_t)(int, int);

class A
{
public:
    const static func_t FP;
};

const func_t A::FP = &add;

int main()
{
 int x = 3;
 int y = 2;
 int z = A::FP(x, y);
 return 0;
}

The compiler generates the following code:

int main()
{
000000013FBA2430  sub         rsp,28h  
int x = 3;
int y = 2;
int z = A::FP(x, y);
000000013FBA2434  mov         edx,2  
000000013FBA2439  lea         ecx,[rdx+1]  
000000013FBA243C  call        qword ptr [A::FP (013FBA45C0h)]  
return 0;
000000013FBA2442  xor         eax,eax
}

I compiled this on the 'Full optimisation' (/Obx flag) and 'Any Suitable' for Inline function Expansion. (/Ob2 flag)

I was wondering why the compiler doesn't inline this call expecially since it's const. Does any of you have an idea why it is not inlined and if it's possible to make the compiler inline it?

Christian

EDIT: I am running some tests now and MSVC fails to inline the function pointers too when:

-I move the const pointer out of the class and make it global.

-I move the const pointer out of the class and make it local in main.

-I make the pointer non-const and move it in locally.

-When I make the return type void and giving it no parameters

I kind start believing Microsoft Visual Studio cannot inline function pointers at all...

回答1:

The problem isn't with inlining, which the compiler does at every opportunity. The problem is that Visual C++ doesn't seem to realize that the pointer variable is actually a compile-time constant.

Test-case:

// function_pointer_resolution.cpp : Defines the entry point for the console application.
//

extern void show_int( int );

extern "C" typedef int binary_int_func( int, int );

extern "C" binary_int_func sum;
extern "C" binary_int_func* const sum_ptr = sum;

inline int call( binary_int_func* binary, int a, int b ) { return (*binary)(a, b); }

template< binary_int_func* binary >
inline int callt( int a, int b ) { return (*binary)(a, b); }

int main( void )
{
    show_int( sum(1, 2) );
    show_int( call(&sum, 3, 4) );
    show_int( callt<&sum>(5, 6) );
    show_int( (*sum_ptr)(1, 7) );
    show_int( call(sum_ptr, 3, 8) );
//  show_int( callt<sum_ptr>(5, 9) );
    return 0;
}

// sum.cpp
extern "C" int sum( int x, int y )
{
    return x + y;
}

// show_int.cpp
#include <iostream>

void show_int( int n )
{
    std::cout << n << std::endl;
}

The functions are separated into multiple compilation units to give better control over inlining. Specifically, I don't want show_int inlined, since it makes the assembly code messy.

The first whiff of trouble is that valid code (the commented line) is rejected by Visual C++. G++ has no problem with it, but Visual C++ complains "expected compile-time constant expression". This is actually a good predictor of all future behavior.

With optimization enabled and normal compilation semantics (no cross-module inlining), the compiler generates:

_main   PROC                        ; COMDAT

; 18   :    show_int( sum(1, 2) );

    push    2
    push    1
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 19   :    show_int( call(&sum, 3, 4) );

    push    4
    push    3
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 20   :    show_int( callt<&sum>(5, 6) );

    push    6
    push    5
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 21   :    show_int( (*sum_ptr)(1, 7) );

    push    7
    push    1
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 22   :    show_int( call(sum_ptr, 3, 8) );

    push    8
    push    3
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int
    add esp, 60                 ; 0000003cH

; 23   :    //show_int( callt<sum_ptr>(5, 9) );
; 24   :    return 0;

    xor eax, eax

; 25   : }

    ret 0
_main   ENDP

There's already a huge difference between using sum_ptr and not using sum_ptr. Statements using sum_ptr generate a indirect function call call DWORD PTR _sum_ptr while all other statements generate a direct function call call _sum, even when the source code used a function pointer.

If we now enable inlining by compiling function_pointer_resolution.cpp and sum.cpp with /GL and linking with /LTCG, we find that the compiler inlines all direct calls. Indirect calls stay as-is.

_main   PROC                        ; COMDAT

; 18   :    show_int( sum(1, 2) );

    push    3
    call    ?show_int@@YAXH@Z           ; show_int

; 19   :    show_int( call(&sum, 3, 4) );

    push    7
    call    ?show_int@@YAXH@Z           ; show_int

; 20   :    show_int( callt<&sum>(5, 6) );

    push    11                  ; 0000000bH
    call    ?show_int@@YAXH@Z           ; show_int

; 21   :    show_int( (*sum_ptr)(1, 7) );

    push    7
    push    1
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 22   :    show_int( call(sum_ptr, 3, 8) );

    push    8
    push    3
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int
    add esp, 36                 ; 00000024H

; 23   :    //show_int( callt<sum_ptr>(5, 9) );
; 24   :    return 0;

    xor eax, eax

; 25   : }

    ret 0
_main   ENDP

Bottom-line: Yes, the compiler does inline calls made through a compile-time constant function pointer, as long as that function pointer is not read from a variable. This use of a function pointer got optimized:

call(&sum, 3, 4);

but this did not:

(*sum_ptr)(1, 7);

All tests run with Visual C++ 2010 Service Pack 1, compiling for x86, hosted on x64.

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86



回答2:

I think that you're right in this conclusion: "... cannot inline function pointers at all".

This very simple example also breaks optimization:

static inline
int add(int x, int y)
{
    return x + y;
}

int main()
{
    int x = 3;
    int y = 2;
    auto q = add;
    int z = q(x, y);
    return z;
}

Your sample is even more complex for the compiler, so it is not surprising.



回答3:

You can try __forceinline. Nobody is going to be able to tell you exactly why it isn't inlined. Common sense says to me that it should be, however. /O2 should favor code speed over code size (inlining)... Strange.



回答4:

This is not a real answer, but a "maybe workaround" one: STL from Microsoft once mentioned that lambdas are more easily inlineable than f ptrs so you could try that.

As a trivia Bjarne often mentions that sort is faster thatn qsort because qsort takes function ptr, but like other people have noted gcc has no problems inlining them... so maybe Bjarne should try gcc :P