I have this assembly (AT&T syntax):
mulsd %xmm0, %xmm1
addsd %xmm1, %xmm2
I want to replace it with:
vfmadd231sd %xmm0, %xmm1, %xmm2
Will this transformation always leave equivalent state in all involved registers and flags? Or will the result floats differ slightly in someway? (If they differ, why is that?)
(About the FMA instructions: http://en.wikipedia.org/wiki/FMA_instruction_set)
No. In fact, a major part of the benefit of fused multiply-add is that it does not (necessarily) produce the same result as a separate multiply and add.
As a (somewhat contrived) example, suppose that we have:
double a = 1 + 0x1.0p-52 // 1 + 2**-52
double b = 1 - 0x1.0p-52 // 1 - 2**-52
and we want to compute a*b - 1
. The "mathematically exact" value of a*b - 1
is:
(1 + 2**-52)(1 - 2**-52) - 1 = 1 + 2**-52 - 2**52 - 2**-104 - 1 = -2**-104
but if we first compute a*b
using multiplication it rounds to 1.0, so the subsequent subtraction of 1.0 produces a result of zero.
If we use fma(a,b,-1)
instead, we eliminate the intermediate rounding of the product, which allows us to get the "real" answer, -1.0p-104
.
Please note that not only do we get a different result, but different flags have been set as well; a separate multiply and subtract sets the inexact flag, whereas the fused multiply-add does not set any flags.