The GCC manual only shows examples where __builtin_expect() is placed around the entire condition of an 'if' statement.
I also noticed that GCC does not complain if I use it, for example, with a ternary operator, or in any arbitrary integral expression for that matter, even one that is not used in a branching context.
So, I wonder what the underlying constraints of its usage actually are.
Will it retain its effect when used in a ternary operation like this:
int foo(int i)
{
return __builtin_expect(i == 7, 1) ? 100 : 200;
}
And what about this case:
int foo(int i)
{
return __builtin_expect(i, 7) == 7 ? 100 : 200;
}
And this one:
int foo(int i)
{
int j = __builtin_expect(i, 7);
return j == 7 ? 100 : 200;
}
It apparently works for both ternary and regular if statements.
First, let's take a look at the following three code samples, two of which use
__builtin_expect
in both regular-if and ternary-if styles, and a third which does not use it at all.builtin.c:
ternary.c:
nobuiltin.c:
When compiled with
-O3
, all three result in the same assembly. However, when the-O
is left out (on GCC 4.7.2), both ternary.c and builtin.c have the same assembly listing (where it matters):builtin.s:
ternary.s:
Whereas nobuiltin.c does not:
The relevant part:
Basically,
__builtin_expect
causes extra code (sete %al
...) to be executed before theje .L2
based on the outcome oftestl %eax, %eax
which the CPU is more likely to predict as being 1 (naive assumption, here) instead of based on the direct comparison of the input char with'c'
. Whereas in the nobuiltin.c case, no such code exists and theje
/jne
directly follows the comparison with 'c' (cmp $99
). Remember, branch prediction is mainly done in the CPU, and here GCC is simply "laying a trap" for the CPU branch predictor to assume which path will be taken (via the extra code and the switching ofje
andjne
, though I do not have a source for this, as Intel's official optimization manual does not mention treating first-encounters withje
vsjne
differently for branch prediction! I can only assume the GCC team arrived at this via trial and error).I am sure there are better test cases where GCC's branch prediction can be seen more directly (instead of observing hints to the CPU), though I do not know how to emulate such a case succinctly/concisely. (Guess: it would likely involve loop unrolling during compilation.)