Why is inlining considered faster than a function

Now, I know it's because there's not the overhead of calling a function, but is the overhead of calling a function really that heavy (and worth the bloat of having it inlined) ?

From what I can remember, when a function is called, say f(x,y), x and y are pushed onto the stack, and the stack pointer jumps to an empty block, and begins execution. I know this is a bit of an oversimplification, but am I missing something? A few pushes and a jump to call a function, is there really that much overhead?

Let me know if I'm forgetting something, thanks!

标签： c++ stack inline

16条回答

神经病院院长

2楼-- · 2019-01-21 19:14

let

int sum(const int &a,const int &b)
{
     return a + b;
}
int a = sum(b,c);

is equal to

int a = b + c

No jump - no overhead

0人赞添加讨论(0) 举报

相关推荐>>

3楼-- · 2019-01-21 19:14

Because there's no call. The function code is just copied

0人赞添加讨论(0) 举报

贼婆χ

4楼-- · 2019-01-21 19:23

A typical example of where it makes a big difference is in std::sort which is O(N log N) on its comparison function.

Try creating a vector of a large size and call std::sort first with an inline function and second with a non-inlined function and measure the performance.

This, by the way, is where sort in C++ is faster than qsort in C, which requires a function pointer.

0人赞添加讨论(0) 举报

乱世女痞

5楼-- · 2019-01-21 19:23

Andrey's answer already gives you a very comprehensive explanation. But just to add one point that he missed, inlining can also be extremely valuable on very short functions.

If a function body consists of just a few instructions, then the prologue/epilogue code (the push/pop/call instructions, basically) might actually be more expensive than the function body itself. If you call such a function often (say, from a tight loop), then unless the function is inlined, you can end up spending the majority of your CPU time on the function call, rather than the actual contents of the function.

What matters isn't really the cost of a function call in absolute terms (where it might take just 5 clock cycles or something like that), but how long it takes relative to how often the function is called. If the function is so short that it can be called every 10 clock cycles, then spending 5 cycles for every call on "unnecessary" push/pop instructions is pretty bad.

0人赞添加讨论(0) 举报

Summer. ? 凉城

6楼-- · 2019-01-21 19:24

Consider a simple function like:

int SimpleFunc (const int X, const int Y)
{
    return (X + 3 * Y); 
}    

int main(int argc, char* argv[])
{
    int Test = SimpleFunc(11, 12);
    return 0;
}

This is converted to the following code (MSVC++ v6, debug):

10:   int SimpleFunc (const int X, const int Y)
11:   {
00401020   push        ebp
00401021   mov         ebp,esp
00401023   sub         esp,40h
00401026   push        ebx
00401027   push        esi
00401028   push        edi
00401029   lea         edi,[ebp-40h]
0040102C   mov         ecx,10h
00401031   mov         eax,0CCCCCCCCh
00401036   rep stos    dword ptr [edi]

12:       return (X + 3 * Y);
00401038   mov         eax,dword ptr [ebp+0Ch]
0040103B   imul        eax,eax,3
0040103E   mov         ecx,dword ptr [ebp+8]
00401041   add         eax,ecx

13:   }
00401043   pop         edi
00401044   pop         esi
00401045   pop         ebx
00401046   mov         esp,ebp
00401048   pop         ebp
00401049   ret

You can see that there are just 4 instructions for the function body but 15 instructions for just the function overhead not including another 3 for calling the function itself. If all instructions took the same time (they don't) then 80% of this code is function overhead.

For a trivial function like this there is a good chance that the function overhead code will take just as long to run as the main function body itself. When you have trivial functions that are called in a deep loop body millions/billions of times then the function call overhead begins to become large.

As always, the key is profiling/measuring to determine whether or not inlining a specific function yields any net performance gains. For more "complex" functions that are not called "often" the gain from inlining may be immeasurably small.

0人赞添加讨论(0) 举报

姐就是有狂的资本

7楼-- · 2019-01-21 19:26

The classic candidate for inlining is an accessor, like std::vector<T>::size().

With inlining enabled this is just the fetching of a variable from memory, likely a single instruction on any architectures. The "few pushes and a jump" (plus the return) is easily multiple times as much.

Add to that the fact that, the more code is visible at once to an optimizer, the better it can do its work. With lots of inlining, it sees lots of code at once. That means that it might be able to keep the value in a CPU register, and completely spare the costly trip to memory. Now we might take about a difference of several orders of magnitude.

And then theres template meta-programming. Sometimes this results in calling many small functions recursively, just to fetch a single value at the end of the recursion. (Think of fetching the value of the first entry of a specific type in a tuple with dozens of objects.) With inlining enabled, the optimizer can directly access that value (which, remember, might be in a register), collapsing dozens of function calls into accessing a single value in a CPU register. This can turn a terrible performance hog into a nice and speedy program.

Hiding state as private data in objects (encapsulation) has its costs. Inlining was part of C++ from the very beginning in order to minimize these costs of abstraction. Back then, compilers were significantly worse in detecting good candidates for inlining (and rejecting bad ones) than they are today, so manually inlining resulted in considerable speed gainings.
Nowadays compilers are reputed to be much more clever than we are about inline. Compilers are able to inline functions automatically or don't inline functions users marked as inline, even though they could. Some say that inlining should be left to the compiler completely and we shouldn't even bother marking functions as inline. However, I have yet to see a comprehensive study showing whether manually doing so is still worth it or not. So for the time being, I'll keep doing it myself, and let the compiler override that if it thinks it can do better.

0人赞添加讨论(0) 举报

1 2 3 下一页

Why is inlining considered faster than a function

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间