The x87 FPU is notable for using an internal 80-bit precision mode, which often leads to unexpected and unreproducible results across compilers and machines. In my search for reproducible floating-point math on .NET, I discovered that both major implementations of .NET (Microsoft's and Mono) emit SSE instructions rather than x87 in 64-bit mode.
SSE(2) uses strictly 32-bit registers for 32-bit floats, and strictly 64-bit registers for 64-bit floats. Denormals can optionally be flushed to zero by setting the appropriate control word.
It would therefore appear that SSE does not suffer from the precision-related issues of x87, and that the only variable is the denormal behavior, which can be controlled.
Leaving aside the matter of transcendental functions (which are not natively provided by SSE unlike x87), does using SSE guarantee reproducible results across machines and compilers? Could compiler optimizations, for instance, translate into different results? I found some conflicting opinions:
If you have SSE2, use it and live happily ever after. SSE2 supports both 32b and 64b operations and the intermediate results are of the size of the operands. - Yossi Kreinin, http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html
...
The SSE2 instructions (...) are fully IEEE754-1985 compliant, and they permit better reproducibility (thanks to the static rounding precision) and portability with other platforms. Muller et aliis, Handbook of Floating-Point Arithmetic - p.107
however:
Also, you can't use SSE or SSE2 for floating point, because it's too under-specified to be deterministic. - John Watte http://www.gamedev.net/topic/499435-floating-point-determinism/#entry4259411