This question already has an answer here:
-
Tell gcc to specifically unroll a loop
3 answers
I have the following loop that I am running on an ARM processor.
// pin here is pointer to some part of an array
for (i = 0; i < v->numelements; i++)
{
pe = pptr[i];
peParent = pe->parent;
SPHERE *ps = (SPHERE *)(pe->data);
pin[0] = FLOAT2FIX(ps->rad2);
pin[1] = *peParent->procs->pe_intersect == &SphPeIntersect;
fixifyVector( &pin[2], ps->center ); // Is an inline function
pin = pin + 5;
}
By the slow performance of the loop, I can judge that the compiler was unable to unroll this loop, as when I manually do the unrolling, it becomes quite fast. I think the compiler is getting confused by the pin
pointer. Can we use restrict
keyword to help the compiler here, or is restrict
only reserved for function parameters? In general how can we tell the compiler to unroll it and don't worry about the pin
pointer.
To tell gcc to unroll all loops you can use the optimization flag -funroll-loops
.
To unroll only a specific loop you can use:
__attribute__((optimize("unroll-loops")))
see this answer for more details.
Edit
If the compiler cannot determine the number of iterations of the loop upon entry you will need to use -funroll-all-loops
. Note that from the documentation: "Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly."
If you extent pptr
size by one, you can use the pld
instruction.
__asm__ __volatile__("pld\t[%0]" :: "r" (pptr[i+1]));
Or alternatively you may need to pre-load the next peParent
and SPHERE *ps
. The loop overhead on an ARM is very small. It is unlikely that un-rolling the loop will be a significant benefit. There are no loop variable constants. It is more likely that the compiler's scheduler is able to fetch advanced data before it is used when you have un-rolled the loop.
You have not presented all of the code to see the data dependencies. There maybe other variables that would benefit from being pre-loaded. Giving a complete example would probably help everyone answer your question.