How to tell the compiler to unroll this loop [dupl

2019-06-04 19:18发布

问题:

This question already has an answer here:

  • Tell gcc to specifically unroll a loop 3 answers

I have the following loop that I am running on an ARM processor.

// pin here is pointer to some part of an array
for (i = 0; i < v->numelements; i++)
{
    pe   = pptr[i];
    peParent = pe->parent;

    SPHERE  *ps = (SPHERE *)(pe->data);

    pin[0] = FLOAT2FIX(ps->rad2);
    pin[1] = *peParent->procs->pe_intersect == &SphPeIntersect;
    fixifyVector( &pin[2], ps->center ); // Is an inline function

    pin = pin + 5;
}

By the slow performance of the loop, I can judge that the compiler was unable to unroll this loop, as when I manually do the unrolling, it becomes quite fast. I think the compiler is getting confused by the pin pointer. Can we use restrict keyword to help the compiler here, or is restrict only reserved for function parameters? In general how can we tell the compiler to unroll it and don't worry about the pin pointer.

回答1:

To tell gcc to unroll all loops you can use the optimization flag -funroll-loops.

To unroll only a specific loop you can use:

__attribute__((optimize("unroll-loops")))

see this answer for more details.

Edit

If the compiler cannot determine the number of iterations of the loop upon entry you will need to use -funroll-all-loops. Note that from the documentation: "Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly."



回答2:

If you extent pptr size by one, you can use the pld instruction.

  __asm__ __volatile__("pld\t[%0]" :: "r" (pptr[i+1]));

Or alternatively you may need to pre-load the next peParent and SPHERE *ps. The loop overhead on an ARM is very small. It is unlikely that un-rolling the loop will be a significant benefit. There are no loop variable constants. It is more likely that the compiler's scheduler is able to fetch advanced data before it is used when you have un-rolled the loop.

You have not presented all of the code to see the data dependencies. There maybe other variables that would benefit from being pre-loaded. Giving a complete example would probably help everyone answer your question.