Now, I know it's because there's not the overhead of calling a function, but is the overhead of calling a function really that heavy (and worth the bloat of having it inlined) ?
From what I can remember, when a function is called, say f(x,y), x and y are pushed onto the stack, and the stack pointer jumps to an empty block, and begins execution. I know this is a bit of an oversimplification, but am I missing something? A few pushes and a jump to call a function, is there really that much overhead?
Let me know if I'm forgetting something, thanks!
There is no calling and stack activity, which certainly saves a few CPU cycles. In modern CPU's, code locality also matters: doing a call can flush the instruction pipeline and force the CPU to wait for memory being fetched. This matters a lot in tight loops, since primary memory is quite a lot slower than modern CPU's.
However, don't worry about inlining if your code is only being called a few times in your application. Worry, a lot, if it's being called millions of times while the user waits for answers!
One other potential side effect of the jump is that you might trigger a page fault, either to load the code into memory the first time, or if it's used infrequently enough to get paged out of memory later.
There are multiple reasons for inlining to be faster, only one of which is obvious:
The cache utilization can also work against you - if inlining makes the code larger, there's more possibility of cache misses. That's a much less likely case though.
Optimizing compilers apply a set of heuristics to determine whether or not inlining will be beneficial.
Sometimes gain from the lack of function call will outweigh the potential cost of the extra code, sometimes not.