I have a 32 bit floating point f
number (known to be positive) that I need to convert to 32 bit unsigned integer. It's magnitude might be too large to fit. Furthermore, there is downstream computation that requires some headroom. I can compute the maximum acceptable value m
as a 32 bit integer. How do I efficiently determine in C++11 on a constrained 32 bit machine (ARM M4F) if f <= m
mathematically. Note that the types of the two values don't match. The following three approaches each have their issues:
static_cast<uint32_t>(f) <= m
: I think this triggers undefined behaviour iff
doesn't fit the 32 bit integerf <= static_cast<float>(m)
: ifm
is too large to be converted exactly, the converted value could be larger thanm
such that the subsequent comparison will produce the wrong result in certain edge casesstatic_cast<double>(f) <= static_cast<double>(m)
: is mathematically correct, but requires casting to, and working with double, which I'd like to avoid for efficiency reasons
Surely there must be a way to convert an integer to a float directly with specified rounding direction, i.e. guaranteeing the result not to exceed the input in magnitude. I'd prefer a C++11 standard solution, but in the worst case platform intrinsics could qualify as well.