I have a 32 bit floating point f
number (known to be positive) that I need to convert to 32 bit unsigned integer. It's magnitude might be too large to fit. Furthermore, there is downstream computation that requires some headroom. I can compute the maximum acceptable value m
as a 32 bit integer. How do I efficiently determine in C++11 on a constrained 32 bit machine (ARM M4F) if f <= m
mathematically. Note that the types of the two values don't match. The following three approaches each have their issues:
static_cast<uint32_t>(f) <= m
: I think this triggers undefined behaviour iff
doesn't fit the 32 bit integerf <= static_cast<float>(m)
: ifm
is too large to be converted exactly, the converted value could be larger thanm
such that the subsequent comparison will produce the wrong result in certain edge casesstatic_cast<double>(f) <= static_cast<double>(m)
: is mathematically correct, but requires casting to, and working with double, which I'd like to avoid for efficiency reasons
Surely there must be a way to convert an integer to a float directly with specified rounding direction, i.e. guaranteeing the result not to exceed the input in magnitude. I'd prefer a C++11 standard solution, but in the worst case platform intrinsics could qualify as well.
I think your best bet is to be a bit platform specific. 2³² can be represented precisely in floating point. Check if
f
is too large to fit at all, and then convert to unsigned and check againstm
.Not fond of the double comparison, but it's clear.
If
f
is usually significantly less thanm
(or usually significantly greater), one can test againstfloat(m)*0.99f
(respectivelyfloat(m)*1.01f
), and then do the exact comparison in the unusual case. That is probably only worth doing if profiling shows that the performance gain is worth the extra complexity.