I am trying to understand how the following code snippet works. This program uses SIMD vector instructions (Intel SSE) to calculate the absolute value of 4 floats (so, basically, a vectorized "fabs()" function).
Here is the snippet:
#include <iostream>
#include "xmmintrin.h"
template <typename T>
struct alignas(16) sse_t
{
T data[16/sizeof(T)];
};
int main()
{
sse_t<float> x;
x.data[0] = -4.;
x.data[1] = -20.;
x.data[2] = 15.;
x.data[3] = -143.;
__m128 a = _mm_set_ps1(-0.0); // ???
__m128 xv = _mm_load_ps(x.data);
xv = _mm_andnot_ps(a,xv); // <-- Computes absolute value
sse_t<float> result;
_mm_store_ps(result.data, xv);
std::cout << "x[0]: " << result.data[0] << std::endl;
std::cout << "x[1]: " << result.data[1] << std::endl;
std::cout << "x[2]: " << result.data[2] << std::endl;
std::cout << "x[3]: " << result.data[3] << std::endl;
}
Now, I know it works, since I ran the program myself to test it. When compiled with g++ 4.8.2, the result is:
x[0]: 4
x[1]: 20
x[2]: 15
x[3]: 143
Three (related) questions puzzle me:
First, how is it even possible to take a bitwise function and apply it on a float? If I try this in vanilla C++, it informs me that this only works for integral types (which makes sense).
But, second, and more importantly: How does it even work? How does taking a NOT and an AND even help you here? Trying this in Python with an integral type just gives you the expected result: any integral number AND -1 (which is NOT 0), simply gives you that number back, but doesn't change the sign. So how does it work here?
Third, I noticed that if I change the value of the float used for the NAND operation (marked with three ???), from -0.0 to 0.0, the program doesn't give me the absolute value anymore. But how can a -0.0 even exist and how does it help?
Helpful references:
Intel intrinsics guide