I've been digging through some parts of the Linux kernel, and found calls like this:
if (unlikely(fd < 0))
{
/* Do something */
}
or
if (likely(!err))
{
/* Do something */
}
I've found the definition of them:
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
I know that they are for optimization, but how do they work? And how much performance/size decrease can be expected from using them? And is it worth the hassle (and losing the portability probably) at least in bottleneck code (in userspace, of course).
This construct tells the compiler that the expression EXP most likely will have the value C. The return value is EXP. __builtin_expect is meant to be used in an conditional expression. In almost all cases will it be used in the context of boolean expressions in which case it is much more convenient to define two helper macros:
These macros can then be used as in
Reference: https://www.akkadia.org/drepper/cpumemory.pdf
They are hint to the compiler to emit instructions that will cause branch prediction to favour the "likely" side of a jump instruction. This can be a big win, if the prediction is correct it means that the jump instruction is basically free and will take zero cycles. On the other hand if the prediction is wrong, then it means the processor pipeline needs to be flushed and it can cost several cycles. So long as the prediction is correct most of the time, this will tend to be good for performance.
Like all such performance optimisations you should only do it after extensive profiling to ensure the code really is in a bottleneck, and probably given the micro nature, that it is being run in a tight loop. Generally the Linux developers are pretty experienced so I would imagine they would have done that. They don't really care too much about portability as they only target gcc, and they have a very close idea of the assembly they want it to generate.
These are macros that give hints to the compiler about which way a branch may go. The macros expand to GCC specific extensions, if they're available.
GCC uses these to to optimize for branch prediction. For example, if you have something like the following
Then it can restructure this code to be something more like:
The benefit of this is that when the processor takes a branch the first time, there is significant overhead, because it may have been speculatively loading and executing code further ahead. When it determines it will take the branch, then it has to invalidate that, and start at the branch target.
Most modern processors now have some sort of branch prediction, but that only assists when you've been through the branch before, and the branch is still in the branch prediction cache.
There are a number of other strategies that the compiler and processor can use in these scenarios. You can find more details on how branch predictors work at Wikipedia: http://en.wikipedia.org/wiki/Branch_predictor
(general comment - other answers cover the details)
There's no reason that you should lose portability by using them.
You always have the option of creating a simple nil-effect "inline" or macro that will allow you to compile on other platforms with other compilers.
You just won't get the benefit of the optimization if you're on other platforms.
As per the comment by Cody, this has nothing to do with Linux, but is a hint to the compiler. What happens will depend on the architecture and compiler version.
This particular feature in Linux is somewhat mis-used in drivers. As osgx points out in semantics of hot attribute, any
hot
orcold
function called with in a block can automatically hint that the condition is likely or not. For instance,dump_stack()
is markedcold
so this is redundant,Future versions of
gcc
may selectively inline a function based on these hints. There have also been suggestions that it is notboolean
, but a score as in most likely, etc. Generally, it should be preferred to use some alternate mechanism likecold
. There is no reason to use it in any place but hot paths. What a compiler will do on one architecture can be completely different on another.They're hints to the compiler to generate the hint prefixes on branches. On x86/x64, they take up one byte, so you'll get at most a one-byte increase for each branch. As for performance, it entirely depends on the application -- in most cases, the branch predictor on the processor will ignore them, these days.
Edit: Forgot about one place they can actually really help with. It can allow the compiler to reorder the control-flow graph to reduce the number of branches taken for the 'likely' path. This can have a marked improvement in loops where you're checking multiple exit cases.