How to implement fast inverse sqrt without undefin

2019-02-12 00:56发布

This question already has an answer here:

From what I understood about strict aliasing rule, this code for fast inverse square root will result in undefined behavior in C++:

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y; // type punning
    i  = 0x5f3759df - ( i >> 1 );
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );

    return y;
}

Does this code indeed cause UB? If yes, how can it be reimplemented in standard compliant way? If not, why not?

Assumptions: before calling this function we have somehow checked that floats are in IEEE 754 32-bit format, sizeof(long)==sizeof(float) and the platform is little-endian.

4条回答
Explosion°爆炸
2楼-- · 2019-02-12 01:04

I don't think you can do this without UB in the strict standardese sense, simply because it relies on things like a particular value representation of float and long. The standard does not specify these (on purpose), and to give implementations the freedom to do what they see fit in this regard, specifically applies the "UB" label to all behaviour which would depend on such things.

That doesn't mean this will be a "Heisenbug" or "nasal demons" type of UB; most likely, the implementation will provide definitions for this behaviour. As long as these definitions satisfy the preconditions of the code, it's fine. The preconditions here being:

  • Type punning is allowed by the implementation.
  • sizeof(float) == sizeof(long).
  • The internal representation of float is such that if the bits are interpreted as a long, 0x5f3759df and >> 1 have the meaning reuqired for this hack to function.

However, it is impossible to re-write this in a fully portable way (i.e. without UB) - what would you do about platforms which use a different implementation of floating-point arithmetic, for example. As long as you're relying a particular platform's specifics, you're relying on behaviour which is undefined in the standard, but defined by the platform.

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-02-12 01:09

You should use memcpy. AFAIK this is the only standard compliant way and compilers are smart enough to replace the call with a single word move instruction. See this question for reasoning behind these claims.

查看更多
不美不萌又怎样
4楼-- · 2019-02-12 01:14

Not. The algorithm starts out by assuming IEEE754 layout, which isn't a valid Standard-compliant assumption. The particular constant 0x5f3759df would be utterly wrong for any other layout. Even the assumption that such a constant exists is flawed. I think it breaks as soon as you'd reverse the order of the fields inside a float, for instance.

I'm also not sure if Q_rsqrt(-0.0f) works correctly even assuming IEEE754. Both zeroes (positive and negative) should be equal and sqrt(x) is only undefined for x<-0.0f

查看更多
劳资没心,怎么记你
5楼-- · 2019-02-12 01:19

The standard compliant way would be std::memcpy. This should be sufficiently standard-compliant under your specified assumptions. Any reasonable compiler will turn that into a bunch of register moves if possible anyway. Furthermore, we can also alleviate (or at least check) some of your made assumptions using C++11's static_assert and fixed width integer types from <cstdint>. Endianness is irrelevant anyway, since we're not working on any arrays here, if an integer type is little-endian, a floating point type is also.

float Q_rsqrt( float number )
{
    static_assert(std::numeric_limits<float>::is_iec559, 
                  "fast inverse square root requires IEEE-comliant 'float'");
    static_assert(sizeof(float)==sizeof(std::uint32_t), 
                  "fast inverse square root requires 'float' to be 32-bit");
    float x2 = number * 0.5F, y = number;
    std::uint32_t i;
    std::memcpy(&i, &y, sizeof(float));
    i  = 0x5f3759df - ( i >> 1 );
    std::memcpy(&y, &i, sizeof(float));
    return y * ( 1.5F - ( x2 * y * y ) );
}
查看更多
登录 后发表回答