I am trying to create a fast decoder for BPSK
using the AVX
intrinsics of Intel. I have a set of complex numbers that are represented as interleaved floats, but due to the BPSK
modulation only the real part (or the even indexed floats) are needed. Every float x
is mapped to 0
, when x < 0
and to 1
if x >= 0
. This is accomplished using the following routine:
static inline void
normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num)
{
static const __m256 _min_mask = _mm256_set1_ps(-1.0);
static const __m256 _max_mask = _mm256_set1_ps(1.0);
static const __m256 _mul_mask = _mm256_set1_ps(0.5);
__m256 res;
__m256i int_res;
size_t i;
gr_complex temp;
float real;
for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){
res = _mm256_load_ps((float *)&in[i]);
/* clamp them to avoid segmentation faults due to indexing */
res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res));
/* Scale accordingly for proper indexing -1->0, 1->1 */
res = _mm256_add_ps(res, _max_mask);
res = _mm256_mul_ps(res, _mul_mask);
/* And then round to the nearest integer */
res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);
int_res = _mm256_cvtps_epi32(res);
_mm256_store_si256((__m256i *) &out[2*i], int_res);
}
}
Firstly, I clamp all the received floats in the range [-1, 1]
. Then after some proper scaling, the result is rounded to the nearest integer. That will map all floats above 0.5
to 1
and all floats below 0.5
to 0
.
The procedure works fine if the input floats are normal numbers. However, due to some situations at previous stages, there is a possibility that some input floats are NaN
or -NaN
. At this case, 'NaN' numbers are propagated through the _mm256_max_ps()
, _mm256_min_ps()
and all other AVX
functions resulting to an integer mapping of -2147483648
which of course causes my program to crash due to invalid indexing.
Is there any workaround to avoid this problem, or at least set the NaN
to 0
using AVX
?
You could do it the simple way to begin with, compare and mask: (not tested)
Or shift and xor: (also not tested)
This version will also care about the sign of NaN (and ignore the NaN-ness).
Alternative for no AVX2 (not tested)
Harold posted a good solution for the question you were really asking, but I want to make clear that eliminating NaN values while clamping is totally straightforward. If either argument is a NaN, MINPS and MAXPS simply return the second argument. So all you need to do is swap the argument order and NaNs will be clamped as well. For example, the following would clamp NaNs to _min_mask: