Specifically, if I have a series of if
...else if
statements, and I somehow know beforehand the relative probability that each statement will evaluate to true
, how much difference in execution time does it make to sort them in order of probability? For example, should I prefer this:
if (highly_likely)
//do something
else if (somewhat_likely)
//do something
else if (unlikely)
//do something
to this?:
if (unlikely)
//do something
else if (somewhat_likely)
//do something
else if (highly_likely)
//do something
It seems obvious that the sorted version would be faster, however for readability or the existence of side-effects, we might want to order them non-optimally. It's also hard to tell how well the CPU will do with branch prediction until you actually run the code.
So, in the course of experimenting with this, I ended up answering my own question for a specific case, however I'd like to hear other opinions/insights as well.
Important: this question assumes that the if
statements can be arbitrarily reordered without having any other effects on the behavior of the program. In my answer, the three conditional tests are mutually exclusive and produce no side effects. Certainly, if the statements must be evaluated in a certain order to achieve some desired behavior, then the issue of efficiency is moot.
No you should not, unless you are really sure that target system is affected. By default go by readability.
I highly doubt your results. I've modified your example a bit, so reversing execution is easier. Ideone rather consistently shows that reverse-order is faster, though not much. On certain runs even this occasionally flipped. I'd say the results are inconclusive. coliru reports no real difference as well. I can check Exynos5422 CPU on my odroid xu4 later on.
The thing is that modern CPUs have branch predictors. There is much-much logic dedicated to pre-fetching both data and instructions, and modern x86 CPUs are rather smart, when it comes to this. Some slimmer architectures like ARMs or GPUs might be vulnerable to this. But it is really highly dependent on both compiler and target system.
I would say that branch ordering optimization is pretty fragile and ephemeral. Do it only as some really fine-tuning step.
Code:
Also depends on your compiler and the platform you’re compiling for.
In theory, the most likely condition should make the control jump as less as possible.
Typically the most likely condition should be first:
The most popular asm’s are based on conditional branches that jump when condition is true. That C code will be likely translated to such pseudo asm:
This is because jumps make the cpu cancel the execution pipeline and stall because the program counter changed (for architectures that support pipelines which are really common). Then it’s about the compiler, which may or may not apply some sophisticated optimizations about having the statistically most probably condition to get the control make less jumps.
The way I usually see this solved for high-performance code is keeping the order that is most readable, but providing hints to the compiler. Here is one example from Linux kernel:
Here the assumption is that access check will pass, and that no error is returned in
res
. Trying to reorder either of these if clauses would just confuse the code, but thelikely()
andunlikely()
macros actually help readability by pointing out what is the normal case and what is the exception.The Linux implementation of those macros uses GCC specific features. It seems that clang and Intel C compiler support the same syntax, but MSVC doesn't have such feature.
Just my 5 cents. It seems the effect of ordering if statements should depend on:
Probability of each if statement.
Number of iterations, so the branch predictor could kick in.
Likely/unlikely compiler hints, i.e. code layout.
To explore those factors, I benchmarked the following functions:
ordered_ifs()
reversed_ifs()
ordered_ifs_with_hints()
reversed_ifs_with_hints()
data
The data array contains random numbers between 0 and 100:
The Results
The following results are for Intel i5@3,2 GHz and G++ 6.3.0. The first argument is the check_point (i.e. probability in %% for the highly likely if statement), the second argument is data_sz (i.e. number of iterations).
Analysis
1. The Ordering Does Matter
For 4K iterations and (almost) 100% probability of highly liked statement the difference is huge 223%:
For 4K iterations and 50% probability of highly liked statement the difference is about 14%:
2. Number of Iterations Does Matter
The difference between 4K and 8K iterations for (almost) 100% probability of highly liked statement is about two times (as expected):
But the difference between 4K and 8K iterations for 50% probability of highly liked statement is 5,5 times:
Why is so? Because of branch predictor misses. Here is the branch misses for each mentioned above case:
So on my i5 the branch predictor fails spectacularly for not-so-likely branches and large data sets.
3. Hints Help a Bit
For 4K iterations the results are somewhat worse for 50% probability and somewhat better for close to 100% probability:
But for 8K iterations the results are always a bit better:
So, the hints also help, but just a tiny bit.
Overall conclusion is: always benchmark the code, because the results may surprise.
Hope that helps.
I made up the following test to time the execution of two different
if
...else if
blocks, one sorted in order of probability, the other sorted in reverse order:Using MSVC2017 with /O2, the results show that the sorted version is consistently about 28% faster than the unsorted version. Per luk32's comment, I also switched the order of the two tests, which makes a noticeable difference (22% vs 28%). The code was run under Windows 7 on an Intel Xeon E5-2697 v2. This is, of course, very problem-specific and should not be interpreted as a conclusive answer.
I decided to rerun the test on my own machine using Lik32 code. I had to change it due to my windows or compiler thinking high resolution is 1ms, using
mingw32-g++.exe -O3 -Wall -std=c++11 -fexceptions -g
GCC has made the same transformation on both original codes.
Note that only the two first conditions are tested as the third must always be true, GCC is a kind of a Sherlock here.
Reverse
So this doesn't tell us much except that the last case doesn't need a branch predict.
Now I tried all 6 combinations of the if's, the top 2 are the original reverse and sorted. high is >= 95, low is < 20, mid is 20-94 with 10000000 iterations each.
So why is the order high, low, med then faster (marginally)
Because the most unpredictable is last and therefore is never run through a branch predictor.
So the branches will be predicted taken, taken and remainder with
6%+(0.94*)20% mispredicts.
"Sorted"
The branches will be predicted with not taken, not taken and Sherlock.
25%+(0.75*)24% mispredicts
Giving 18-23% difference (measured difference of ~9%) but we need to calculate cycles instead of mispredicting %.
Let's assume 17 cycles mispredict penalty on my Nehalem CPU and that each check takes 1 cycle to issue (4-5 instructions) and the loop takes one cycle too. The data dependencies are the counters and the loop variables, but once the mispredicts are out of the way it shouldn't influence the timing.
So for "reverse", we get the timings (this should be the formula used in Computer Architecture: A Quantitative Approach IIRC).
and the same for "sorted"
(8.26-7.24)/8.26 = 13.8% vs. ~9% measured (close to the measured!?!).
So the obvious of the OP is not obvious.
With these tests, other tests with more complicated code or more data dependencies will certainly be different so measure your case.
Changing the order of the test changed the results but that could be because of different alignments of the loop start which should ideally be 16 bytes aligned on all newer Intel CPUs but isn't in this case.