Reproducibility of floating point operation result

2020-02-14 03:02发布

问题:

Is it possible for an floating-point arithmetic operation to yield different results on different CPUs? By CPUs i mean all of x86 and x64. And by different results i mean even if only a single least important bit is different.. I need to know if I can use floating point operations on project where it's vital to have exactly the same results corresponding to same input on different machines.

Edit: added c++ tag.
Also to clarify: I need reproducible results run-time. I wouldn't expect identical results from different compilations.

回答1:

In the gaming industry this is referred to as deterministic lockstep, and is very important for real-time networked games where the clients and server need to be in agreement about the state of physics objects (players, projectiles, deformable terrain etc).

According to Glenn Fiedler's article on Floating Point Determinism, the answer is "a resoundingly limp maybe"; if you run the same binary on the same architecture and restrict the use of features that are less well specified than basic floating-point, then you can get the same results. Otherwise, if you use different compilers, or allow your code to use SSE or 80-bit floating point, then results will vary between different executables and different machines.

Yosef Kreinin recommends:

  • scanning assembler output for algebraic optimisations and applying them to your source code;
  • suppressing fused multiply-add and other advanced instructions (e.g. the sin trigonometric function);
  • and using SSE or SSE2, or otherwise setting the FPU CSR to 64-bit. (Yes, this conflicts with Glenn Fiedler's recommendation.)

And of course, test your code on multiple different machines; take hashes of intermediate outputs, so you can tell just where and when your simulations are diverging.



回答2:

If you call a dynamically-linked library, you may get different code on different processors. (For example, the Accelerate library on Mac OS X uses different implementations of its routines on different processors.)

However, if you use identical executable images (including all libraries) that do not dispatch based on processor model and have identical inputs (including any changes made to floating-point modes or other global state that can affect floating-point), then the processor produces identical results for all elementary floating-point arithmetic (add, subtract, multiply, divide, compare, convert).

Certain operations might not be fully specified to return identical results on different processors, such as the inverse-square-root-estimate instruction.

Concerns mentioned in ecatmur’s answer about optimizations made by the compiler, fused multiply-add, and SSE/SSE2/FPU use, do not apply to identical binaries. Those concerns apply only when different compilations (different switches, different target platforms, different compiler versions) might produce different code. Since you have excluded different compilations, these concerns are not relevant.

If you build for both a 32-bit target (i386) and a 64-bit target (x86_64) you are making two executable images (in one “fat” file), and the concerns about different compiler products apply.