I've never written assembly code for SSE optimization, so sorry if this is a noob question. In this aritcle is explained how to vectorize a for
with a conditional statement. However, my code (taken from here ) is of the form:
for (int j=-halfHeight; j<=halfHeight; ++j)
{
for(int i=-halfWidth; i<=halfWidth; ++i)
{
const float rx = ofsx + j * a12;
const float ry = ofsy + j * a22;
float wx = rx + i * a11;
float wy = ry + i * a21;
const int x = (int) floor(wx);
const int y = (int) floor(wy);
if (x >= 0 && y >= 0 && x < width && y < height)
{
// compute weights
wx -= x; wy -= y;
// bilinear interpolation
*out++ =
(1.0f - wy) * ((1.0f - wx) * im.at<float>(y,x) + wx * im.at<float>(y,x+1)) +
( wy) * ((1.0f - wx) * im.at<float>(y+1,x) + wx * im.at<float>(y+1,x+1));
} else {
*out++ = 0;
}
}
}
So, from my understanding, there are several differences with the linked article:
- Here we have a nested
for
: I've always seen one levelfor
in vectroization, never seen a nested loop - The if condition is based on scalar values (x and y) and not on the array: how can I adapt the linked example to this?
- The
out
index isn't based oni
orj
(so it's notout[i]
orout[j]
): how can I fillout
in this way?
In particular I'm confused because for
indexes are always used as array indexes, while here are used to compute variables while the vector is incremented cycle by cycle
I'm using icpc
with -O3 -xCORE-AVX2 -qopt-report=5
and a bunch of others optimization flags. According to Intel Advisor, this is not vectorized, and using #pragma omp simd
generates warning #15552: loop was not vectorized with "simd"