SIMD下面的代码(SIMD the following code)

2019-07-30 20:33发布

如何SIMIDize在C下面的代码(当然使用SIMD内部函数)? 我无法理解SIMD内部函数,这将有很大的帮助:

int sum_naive( int n, int *a )
{
    int sum = 0;
    for( int i = 0; i < n; i++ )
        sum += a[i];
    return sum;
}

Answer 1:

这里是一个非常简单的实现(警告:未经测试的代码):

int32_t sum_array(const int32_t a[], const int n)
{
    __m128i vsum = _mm_set1_epi32(0);       // initialise vector of four partial 32 bit sums
    int32_t sum;
    int i;

    for (i = 0; i < n; i += 4)
    {
        __m128i v = _mm_load_si128(&a[i]);  // load vector of 4 x 32 bit values
        vsum = _mm_add_epi32(vsum, v);      // accumulate to 32 bit partial sum vector
    }
    // horizontal add of four 32 bit partial sums and return result
    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8));
    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 4));
    sum = _mm_cvtsi128_si32(vsum);
    return sum;
}

注意,输入阵列, a[]需要进行16字节对准,并且n应是4的倍数。



文章来源: SIMD the following code
标签: c x86 sse simd