accessing __m128 fields across compilers

2019-05-25 08:36发布

问题:

I've noticed that accessing __m128 fields by index is possible in gcc, without using the union trick.

__m128 t;

float r(t[0] + t[1] + t[2] + t[3]);

I can also load a __m128 just like an array:

__m128 t{1.f, 2.f, 3.f, 4.f};

This is all in line with gcc's vector extensions. These, however, may not be available elsewhere. Are the loading and accessing features supported by the intel compiler and msvc?

回答1:

To load a __m128, you can write _mm_setr_ps(1.f, 2.f, 3.f, 4.f), which is supported by GCC, ICC, MSVC and clang.

So far as I know, clang and recent versions of GCC support accessing __m128 fields by index. I don't know how to do this in ICC or MSVC. I guess _mm_extract_ps works for all 4 compilers but its return type is insane making it painful to use.



回答2:

If you want you code to work on other compilers then don't use those GCC extensions. Use the set/load/store intrinsics. _mm_setr_ps is fine for setting constant values but should not be used in a loop. To access elements I normally store the values to an array first then read the array.

If you have an array a you should read/store it in with

__m128 t = _mm_loadu_ps(a);
_mm_storeu_ps(a, t);

If the array is 16-byte aligned you can use an aligned load/store which is slightly faster on newer systems but much faster on older systems.

__m128 t = _mm_load_ps(a);
_mm_store_ps(a, t);

To get 16-byte aligned memory on the stack use

__declspec(align(16)) const float a[] = ...//MSVC
__attribute__((aligned(16))) const float a[] ...//GCC, ICC

For 16-byte aligned dynamic arrays use:

float *a = (float*)_mm_malloc(sizeof(float)*n, 16); //MSVC, GCC, ICC, MinGW