I have a critical loop in my code with this shape :
int myloop(int a, .....){
/* some stuff */
// Critical loop
while(...){
/* Some Stuff */
if(a == 1){
// .....
}
else if(a == 2){
// .....
}
else if(a == 3){
// .....
}
else{
// ....
}
}
}
As the loop never touches the value of "a" the branch taken will never change, but as this loop is really heavy it will require to test the value of "a" many times, which is totally unnecessary. The best thing is probably to duplicate the loop, so that the "if" can be tested before the loop begins, but this would mean copying a lot of stuff common to both situations and will result in a very ugly code...
Is there any way to ask GCC/G++ to duplicate this code when it compiles it ? Or any other trick to avoid testing the value so many times ?
Thank you for your help !
Nathann
You can use an array with pointer to functions, if your conditions are mostly adjacent.
If you can be assure that you'll have no holes in your array and
a
will never be outside array bounds you can omitif ( func_array[ a ] && a < sizeof( func_array ) )
, reducing the code to no comparison at all.Anyway, this approach can be a bit clearer, and may be worth even if it gives no big performance gain.
Are the operations taken under each condition at all similar? In these situations I usually use the if statements to set the value of key objects or pointers to objects and then use those values in the remaining code... basically, maximize your use of indirection.
dim x as object
dim y as string
If (a==1) then
x=foo.object
y=bar.string
elseif (a==2)
x=yuk.object
y=yo.string
end if
while ...
dofunction(x,y)
end while
But, if your operations are wildly different, then it's already kind of ugly, and you might as well create multiple loops to save the clock cycles...
One common way of doing this is as follows:
Note that because
a
is passed as a literal constant whenmyloop_inline()
is called formmyloop()
the compiler can optimise away all irrelevant tests and dead code when it expands the inline function.You may want to take steps to ensure that
myloop_inline()
actually gets inlined, this includes compiling with optimisation enabled of course (e.g.-O3
), and in the case of e.g. gcc you might want to add__attribute__ ((always_inline))
.First of all, you can use a
switch
statement here:This may enable the compiler to produce quicker code, i.e. do a single computed jump rather than multiple checks against
a
.Refactor! What about putting the shared code into a separate function, probably declare it
inline
, and hope that the compiler follows the hint? Function inlining is a good way to let the compiler do code duplication (the other two ways being templates and the preprocessor, both are clearly inappropriate here).Of course it depends on what you do in your loop, but you should get the basic principle.
Last but not least: profile. Check if the loop really is the performance bottleneck. Have a look at the generated machine code. Chances are good that the compiler does already use most of the proposed optimizations.