Is it possible to make a reciprocal of float division in form of look up table (such like 1/f -> 1*inv[f] ) ? How it could be done? I think some and mask and shift should be appled to float to make it a form of index? How would be it exectly?
相关问题
- Sorting 3 numbers without branching [closed]
- Multiple sockets for clients to connect to
- How to compile C++ code in GDB?
- Why does const allow implicit conversion of refere
- thread_local variables initialization
One method is:
In step 2, you'll need to choose the number of bits to use, trading between accuracy and table size. You could obtain more accuracy by using the less significant bits to interpolate between table entries.
In step 3, the adjustment is necessary because the input mantissa was in the range (0.5, 1.0], and so its reciprocal is in the range [1.0, 2.0), which needs renormalising to give the output mantissa.
I won't try to write the code for this, since there are probably some slightly fiddly edge cases that I'd miss.
You should also investigate methods involving numerical calculations, which might give better results if memory access is slow; on a modern PC architecture, a cache miss might be as expensive as dozens of arithmetic operations. Wikipedia looks like a good starting point. And of course, whatever you do, measure it to make sure it is actually faster than an FPU division operation.
If your minimum step is something like 0.01 then you can support inverse-f from a table. Each index is multiplied by 100 so you can have
If you want better precision, you will need more ram.
Another example:
Another example:
You can guess an approximate inverse like this:
In my tests, 0x7EF19D07 was slightly better (tested with the effects of 2 Newton-Raphson refinements included).
Which you can then improve with Newton-Raphson:
Iterate as often as you want. 2 or 3 iterations give nice results.
Better Initial Approximations
To minimize the relative error:
To minimize the absolute error for inputs between 1 and 2 (they work well enough outside that range, but they may not be the best):
For three refinement steps, the initial approximation barely affects the maximum relative error. 0x7EEEEEEE works great, and I couldn't find anything better.