In SSE there is a function _mm_cvtepi32_ps(__m128i input)
which takes input vector of 32 bits wide signed integers (int32_t
) and converts them into float
s.
Now, I want to interpret input integers as not signed. But there is no function _mm_cvtepu32_ps
and I could not find an implementation of one. Do you know where I can find such a function or at least give a hint on the implementation?
To illustrate the the difference in results:
unsigned int a = 2480160505; // 10010011 11010100 00111110 11111001
float a1 = a; // 01001111 00010011 11010100 00111111;
float a2 = (signed int)a; // 11001110 11011000 01010111 10000010
With Paul R's solution and with my previous solution the difference between the rounded floating point and the original integer is less than or equal to 0.75 ULP (Unit in the Last Place). In these methods at two places rounding may occur: in _mm_cvtepi32_ps and in _mm_add_ps. This leads to results that are not as accurate as possible for some inputs.
For example, with Paul R's method 0x2000003=33554435 is converted to 33554432.0, but 33554436.0 also exists as a float, which would have been better here. My previous solution suffers from similar inaccuracies. Such inaccurate results may also occur with compiler generated code, see here.
Following the approach of gcc (see Peter Cordes' answer to that other SO question), an accurate conversion within 0.5 ULP is obtained:
Note that other high bits/low bits partitions are possible as long as _mm_cvt_ps can convert both pieces to floats without rounding. For example, a partition with 20 high bits and 12 low bits will work equally well.
This functionality exists in AVX-512, but if you can't wait until then the only thing I can suggest is to convert the
unsigned int
input values into pairs of smaller values, convert these, and then add them together again, e.g.UPDATE
As noted by @wim in his answer, the above solution fails for an input value of
UINT_MAX
. Here is a more robust, but slightly less efficient solution, which should work for the fulluint32_t
input range:I think Paul's answer is nice, but it fails for v=4294967295U (=2^32-1). In that case v2=2^31-1 and v1=2^31. Intrinsic _mm_cvtepi32_ps converts 2^31 to -2.14748365E9 . v2=2^31-1 is converted to 2.14748365E9 and consequently
_mm_add_ps
returns 0 (due to rounding v1f and v2f are the exact opposite of each other).The idea of the solution below is to copy the most significant bit of v to v_high. The other bits of v are copied to v_low. v_high is converted to 0 or 2.14748365E9 .
Update
It was possible to reduce the number of instructions:
Intrinsic
_mm_srai_epi32
shifts the most significant bit of v to the right, while shifting in sign bits, which turns out to be quite useful here.