How does this function compute the absolute value

2019-02-23 17:04发布

问题:

I am trying to understand how the following code snippet works. This program uses SIMD vector instructions (Intel SSE) to calculate the absolute value of 4 floats (so, basically, a vectorized "fabs()" function).

Here is the snippet:

#include <iostream>
#include "xmmintrin.h"

template <typename T>
struct alignas(16) sse_t
{
    T data[16/sizeof(T)];
};

int main()
{
    sse_t<float> x;
    x.data[0] = -4.;
    x.data[1] = -20.;
    x.data[2] = 15.;
    x.data[3] = -143.;
    __m128 a = _mm_set_ps1(-0.0); // ???
    __m128 xv = _mm_load_ps(x.data);
    xv = _mm_andnot_ps(a,xv); // <-- Computes absolute value
    sse_t<float> result;
    _mm_store_ps(result.data, xv);
    std::cout << "x[0]: " << result.data[0] << std::endl;
    std::cout << "x[1]: " << result.data[1] << std::endl;
    std::cout << "x[2]: " << result.data[2] << std::endl;
    std::cout << "x[3]: " << result.data[3] << std::endl;
}

Now, I know it works, since I ran the program myself to test it. When compiled with g++ 4.8.2, the result is:

x[0]: 4
x[1]: 20
x[2]: 15
x[3]: 143

Three (related) questions puzzle me:

First, how is it even possible to take a bitwise function and apply it on a float? If I try this in vanilla C++, it informs me that this only works for integral types (which makes sense).

But, second, and more importantly: How does it even work? How does taking a NOT and an AND even help you here? Trying this in Python with an integral type just gives you the expected result: any integral number AND -1 (which is NOT 0), simply gives you that number back, but doesn't change the sign. So how does it work here?

Third, I noticed that if I change the value of the float used for the NAND operation (marked with three ???), from -0.0 to 0.0, the program doesn't give me the absolute value anymore. But how can a -0.0 even exist and how does it help?

Helpful references:

Intel intrinsics guide

回答1:

-0.0 is represented as 1000...0001. Therefore _mm_andnot_ps(-0.0, x)2 is equivalent to 0111...111 & x. This forces the MSB (which is the sign bit) to 0.


1. In IEEE-754, at least.

2. The _mm_andnot_ps intrinsic does not mean "NAND"; see e.g. http://msdn.microsoft.com/en-us/library/68h7wd02(v=vs.90).aspx.