C++ SIMD: Store uint64_t value after bitwise and o

2019-07-09 12:10发布

问题:

I am trying to do a bitwise & between elements of two arrays of uint64_t integers and then store the result in another array. This is my program:

#include <emmintrin.h>
#include <nmmintrin.h>
#include <chrono>


int main()
{

  uint64_t data[200];
  uint64_t data2[200];
  uint64_t data3[200];
  __m128i* ptr = (__m128i*) data;
  __m128i* ptr2 = (__m128i*) data2;
  uint64_t* ptr3 = data3;

  for (int i = 0; i < 100; ++i, ++ptr, ++ptr2, ptr3 += 2)
    _mm_store_ps(ptr3, _mm_and_si128(*ptr, *ptr2));

}

However, I get this error:

test.cpp:17:50: error: cannot convert ‘uint64_t* {aka long unsigned int*}’ to ‘float*’ for argument ‘1’ to ‘void _mm_store_ps(float*, __m128)’
     _mm_store_ps(ptr3, _mm_and_si128(*ptr, *ptr2));

For some reason, the compiler thinks I'm copying to an array of floats. Is it possible to do what I am trying to do with arrays of uint64_t?

回答1:

You can use _mm_store_si128.

First change pointer ptr3 to

  __m128i* ptr3 = (__m128i*) data3;

and then

  for (int i = 0; i < 100; ++i, ++ptr, ++ptr2, ++ptr3)
    _mm_store_si128(ptr3, _mm_and_si128(*ptr, *ptr2));


回答2:

You are using the floating point operation _mm_store_ps although you actually want to store integers. So either use _mm_store_si128 or cast the result back to uint64_t.

You should also make sure to align the arrays to 16 byte, so aligned load/store operations can be used which will be faster.

#include <emmintrin.h>
#include <nmmintrin.h>
#include <chrono>

int main()
{
  __declspec(align(16)) uint64_t data[200];
  __declspec(align(16)) uint64_t data2[200];
  __declspec(align(16)) uint64_t data3[200];
  __m128i* ptr = (__m128i*) data;
  __m128i* ptr2 = (__m128i*) data2;
  __m128i* ptr3 = (__m128i*) data3;

  for (int i = 0; i < 100; ++i, ++ptr, ++ptr2, ++ptr3)
    *ptr3 = _mm_and_si128(*ptr, *ptr2);
}


标签: c++ c++11 sse simd