Auto vectorization on double and ffast-math

2019-04-08 21:46发布

Why is it mandatory to use -ffast-math with g++ to achieve the vectorization of loops using doubles? I don't like -ffast-math because I don't want to lose precision.

3条回答
2楼-- · 2019-04-08 22:02

Very likely because vectorization means that you may have different results, or may mean that you miss floating point signals/exceptions.

If you're compiling for 32 bit x86 then gcc and g++ default to using the x87 for floating point math, on 64 bit they default to sse, however the x87 can and will produce different values for the same computation so it's unlikely g++ will consider vectorizing if it can't guarantee that you will get the same results unless you use -ffast-math or some of the flags it turns on.

Basically it comes down to the floating point environment for vectorized code may not be the same as the one for non vectorized code, sometimes in ways that are important, if the differences don't matter to you, something like

-fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math

but first look up those options and make sure that they won't affect your program's correctness. -ffinite-math-only may help also

查看更多
手持菜刀,她持情操
3楼-- · 2019-04-08 22:07

You don’t necessarily lose precision with -ffast-math. It only affects the handling of NaN, Inf etc. and the order in which operations are performed.

If you have a specific piece of code where you do not want GCC to reorder or simplify computations, you can mark variables as being used using an asm statement.

For instance, the following code performs a rounding operation on f. However, the two f += g and f -= g operations are likely to get optimised away by gcc:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    f -= g;                                                                
    return f;                                                            
}                                                                     

On x86_64, you can use this asm statement to instruct GCC not to perform that optimisation:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    __asm__("" : "+x" (f));
    f -= g;
    return f;
}

You will need to adapt this for each architecture, unfortunately. On PowerPC, use +f instead of +x.

查看更多
做自己的国王
4楼-- · 2019-04-08 22:07

Because -ffast-math enables operands reordering which allows many code to be vectorized.

For example to calculate this

sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99]

the compiler is required to do the additions sequentially without -ffast-math, because floating-point math is neither commutative nor associative.

That's the same reason why compilers can't optimize a*a*a*a*a*a to (a*a*a)*(a*a*a) without -ffast-math

That means no vectorization available unless you have very efficient horizontal vector adds.

However if -ffast-math is enabled, the expression can be calculated like this (Look at A7. Auto-Vectorization)

sum0 = a[0] + a[4] + a[ 8] + … a[96]
sum1 = a[1] + a[5] + a[ 9] + … a[97]
sum2 = a[2] + a[6] + a[10] + … a[98]
sum3 = a[3] + a[7] + a[11] + … a[99]
sum’ = sum0 + sum1 + sum2 + sum3

Now the compiler can vectorize it easily by adding each column in parallel and then do a horizontal add at the end

Does sum’ == sum? Only if (a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a[3]+a[7]+…) == a[0] + a[1] + a[2] + … This holds under associativity, which floats don’t adhere to, all of the time. Specifying /fp:fast lets the compiler transform your code to run faster – up to 4 times faster, for this simple calculation.

Do You Prefer Fast or Precise? - A7. Auto-Vectorization

It may be enabled by the -fassociative-math flag in gcc

Further readings

查看更多
登录 后发表回答