SSE: reciprocal if not zero

2019-04-06 14:09发布

How can I take the reciprocal (inverse) of floats with SSE instructions, but only for non-zero values?

Background bellow:

I want to normalize an array of vectors so that each dimension has the same average. In C this can be coded as:

float vectors[num * dim]; // input data

// step 1. compute the sum on each dimension
float norm[dim];
memset(norm, 0, dim * sizeof(float));
for(int i = 0; i < num; i++) for(int j = 0; j < dims; j++)
    norm[j] += vectors[i * dims + j];
// step 2. convert sums to reciprocal of average
for(int j = 0; j < dims; j++) if(norm[j]) norm[j] = float(num) / norm[j];
// step 3. normalize the data
for(int i = 0; i < num; i++) for(int j = 0; j < dims; j++)
    vectors[i * dims + j] *= norm[j];

Now for performance reasons, I want to do this using SSE intinsics. Setp 1 et step 3 are easy, but I'm stuck at step 2. I don't seem to find any code sample or obvious SSE instruction to take the recirpocal of a value if it is not zero. For the division, _mm_rcp_ps does the trick, and maybe combine it with a conditional move, but how to get a mask indicating which component is zero?

I don't need the code to the algorithm described above, just the "inverse if not zero" function:

__m128 rcp_nz_ps(__m128 input) {
    // ????
}

Thanks!

1条回答
三岁会撩人
2楼-- · 2019-04-06 15:08
__m128 rcp_nz_ps(__m128 input) {
    __m128 mask = _mm_cmpeq_ps(_mm_set1_ps(0.0), input);
    __m128 recip = _mm_rcp_ps(input);
    return _mm_andnot_ps(mask, recip);
}

Each lane of mask is set to either b111...11 if the input is zero, and b000...00 otherwise. And-not with that mask replaces elements of the reciprocal corresponding to a zero input with zero.

查看更多
登录 后发表回答