adding the components of an SSE register

2019-04-03 15:58发布

问题:

I want to add the four components of an SSE register to get a single float. This is how I do it now:

float a[4];
_mm_storeu_ps(a, foo128);
float x = a[0] + a[1] + a[2] + a[3];

Is there an SSE instruction that directly achieves this?

回答1:

You could probably use the HADDPS SSE3 instruction, or its compiler intrinsic _mm_hadd_ps,

For example, see http://msdn.microsoft.com/en-us/library/yd9wecaa(v=vs.80).aspx

If you have two registers v1 and v2 :

v = _mm_hadd_ps(v1, v2);
v = _mm_hadd_ps(v, v);

Now, v[0] contains the sum of v1's components, and v[1] contains the sum of v2's components.



回答2:

If you want your code to work on pre-SSE3 CPUs (which do not support _mm_hadd_ps), you might use the following code. It uses more instructions, but decodes to less microops on most CPUs.

 __m128 temp = _mm_add_ps(_mm_movehl_ps(foo128, foo128), foo128);
 float x;
 _mm_store_ss(&x, _mm_add_ss(temp, _mm_shuffle_ps(temp, 1)));


回答3:

Well, I don't know about any such function, but it can be done using _mm_hadd_ps() two times.