I'm looking to take an IEEE double and remove any integer part of it in the most efficient manner possible.
I want
1035 ->0
1045.23->0.23
253e-23=253e-23
I do not care about properly handling denormals, infinities, or NaNs. I do not mind bit twiddling, as I know I am working with IEEE doubles, so it should work across machines.
Branchless code would be much preferred.
My first thought is (in pseudo code)
char exp=d.exponent;
(set the last bit of the exponent to 1)
d<<=exp*(exp>0);
(& mask the last 52 bits of d)
(shift d left until the last bit of the exponent is zero, decrementing exp each time)
d.exponent=exp;
But the problem is that I can't think of an efficient way to shift d left until the last bit of the exponent is zero, plus it seems it would need to output zero if all of the last bits weren't set. This seems to be related to the base 2 logarithm problem.
Help with this algorithm or any better ones would be much appreciated.
I should probably note that the reason I want branchless code is because I want it to efficiently vectorize.
How about something simple?
This just subtracts the integer portion of the double from the value itself, the remainder should be the fractional component. It's possible, of course, this could have some representation issues.
The optimal implementation depends entirely on the target architecture.
On recent Intel processors, this can be achieved with two instructions:
roundsd
andsubsd
, but that can't be expressed in portable C code.On some processors, the fastest way to do this is with integer operations on the floating point representation. Early Atom and many ARM CPUs come to mind.
On some other processors, the fastest thing is to cast to integer and back, then subtract, branching to protect large values.
If you're going to be handling lots of values, you can set the rounding mode to round-to-zero, then add and subtract +/-2^52 to the number truncated to integer, then subtract from the original value to get the fraction. If you don't have SSE4.1, but do have an otherwise modern Intel CPU and want to vectorize, this is typically the best you can do. It only makes sense if you have many values to process, however, because changing the rounding mode is somewhat expensive.
On other architectures, other implementations are optimal. In general, it doesn't make sense to talk about "efficiency" of C programs; only the efficiency of a specific implementation on a specific architecture.
Standard library function modf solves this problem quite neatly.
This should do what you have asked, is portable, and reasonably efficient.
An undocumented detail is whether the second argument could be NULL and then avoid the integral part temporary, which would be desirable in uses such as that you have described.
Unfortunately it seams many implementations do not support NULL for the second argument, so will have to use a temporary whether or not you use this value.
Some profiling and experimentation using C++ in Microsoft Visual Studio 2015 indicates that the best method for positive numbers is:
It's faster than
modf
, and, as has already been mentioned, the remainder function rounds to the nearest integer, and therefore isn't of use.Proposal
The function
remainder
computes the remainder, but not the integer part likemodf
does:This is the most efficient (and portable) way, as it doesn't compute unnecessary values to do the job (cf.
modf
,(long)
,fmod
, etc.)Benchmark
As Mattew suggested in the comments, I wrote some benchmark code to compare this solution to all the other ones offered on this page.
Please find below the time measurements for 65536 computations (compiled with Clang with optimizations turned off):
Again with Clang, this time using the
-O3
flag:Turns out the simplest solution seems to give the best results on most platforms, and the specific methods to perform that task (
fmod
,modf
,remainder
) are actually super-slow!