I have manged to convert most of my SIMD code to us the vector extensions of GCC. However, I have not found a good solution for doing a broadcast as follows
__m256 areg0 = _mm256_broadcast_ss(&a[i]);
I want to do
__m256 argeg0 = a[i];
If you see my answer at Mutiplying vector by constant using SSE I managed to get broadcasts working with another SIMD register. The following works:
__m256 x,y;
y = x + 3.14159f; // broadcast x + 3.14159
y = 3.14159f*x; // broadcast 3.14159*x
but this won't work:
__m256 x;
x = 3.14159f; //should broadcast 3.14159 but does not work
How can I do this with GCC?
I think there is currently no direct way and you have to work around it using the syntax you already noticed:
__m256 zero={};
__m256 x=zero+3.14159f;
It may change in the future if we can agree on a good syntax, see PR 55726.
Note that if you want to create a vector { s, s, ... s }
with a non-constant float s
, the technique above only works with integers, or with floats and -fno-signed-zeros
. You can tweak it to __m256 x=s-zero;
and it will work unless you use -frounding-math
. A last version, suggested by Z boson, is __m256 x=(zero+1.f)*s;
which should work in most cases (except possibly with a compiler paranoid about sNaN).
It turns out that with a precise floating point model (e.g. with -O3
) that GCC cannot simplify x+0
to x
due to signed zero. So x = zero+3.14159f
produces inefficient code. However GCC can simplify 1.0*x
to just x
therefore the efficient solution in this case is.
__m256 x = ((__m256){} + 1)*3.14159f;
https://godbolt.org/g/5QAQkC
See this answer for more details.
A simpler solution is just x = 3.14159f - (__m256){}
because x - 0 = x
irrespective of signed zero.