I'm still pretty new to using SSE and am trying to implement a modulo of 2*Pi
for double-precision inputs of the order 1e8
(the result of which will be fed into some vectorised trig calculations).
My current attempt at the code is based around the idea that mod(x, 2*Pi) = x - floor(x/(2*Pi))*2*Pi
and looks like:
#define _PD_CONST(Name, Val) \
static const double _pd_##Name[2] __attribute__((aligned(16))) = { Val, Val }
_PD_CONST(2Pi, 6.283185307179586); /* = 2*pi */
_PD_CONST(recip_2Pi, 0.159154943091895); /* = 1/(2*pi) */
void vec_mod_2pi(const double * vec, int Size, double * modAns)
{
__m128d sse_a, sse_b, sse_c;
int i;
int k = 0;
double t = 0;
unsigned int initial_mode;
initial_mode = _MM_GET_ROUNDING_MODE();
_MM_SET_ROUNDING_MODE(_MM_ROUND_DOWN);
for (i = 0; i < Size; i += 2)
{
sse_a = _mm_loadu_pd(vec+i);
sse_b = _mm_mul_pd( _mm_cvtepi32_pd( _mm_cvtpd_epi32( _mm_mul_pd(sse_a, *(__m128d*)_pd_recip_2Pi) ) ), *(__m128d*)_pd_2Pi);
sse_c = _mm_sub_pd(sse_a, sse_b);
_mm_storeu_pd(modAns+i,sse_c);
}
k = i-2;
for (i = 0; i < Size%2; i++)
{
t = (double)((int)(vec[k+i] * 0.159154943091895)) * 6.283185307179586;
modAns[k+i] = vec[k+i] - t;
}
_MM_SET_ROUNDING_MODE(initial_mode);
}
Unfortunately, this is currently returning a lot of NaN
with a couple of answers of 1.128e119
as well (some what outside the range of 0
-> 2*Pi
that I was aiming for!). I suspect that where I'm going wrong is in the double-to-int-to-double conversion that I'm trying to use to do the floor
.
Can anyone suggest where I've gone wrong and how to improve it?
P.S. sorry about the format of that code, it's the first time I've posted a question on here and can't seem to get it to give me empty lines within the code block to make it readable.