OpenCL reduction result wrong with large floats

2020-03-31 06:44发布

问题:

I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct.

I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue?

回答1:

There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating point.

The correct sum is: 65537 * 32768 = 2147516416, and it takes 31 bits to represent it in binary (10000000000000001000000000000000). 32-bit floats can only hold integers accurately up to 2^24.

"Any integer with absolute value less than [2^24] can be exactly represented in the single precision format" "Floating Point" article, wikipedia

This is why you are getting the correct sum when it is less than or equal to 2^24. If you are doing a complete sum using single-precision, you will eventually lose accuracy no matter which device you are executing the kernel on. There are a few things you can do to get the correct answer:

  • use double instead of float if your platform supports it
  • use int or unsigned int
  • sum a smaller set of numbers eg: 0+1+2+...+4095+4096 = (2^23 + 2^11)

Read more about single precision here.



回答2:

This is a "side effect" of summing floating point numbers using finite precision CPU's or GPU's. The accuracy depends the algorithm and the order the values are summed. The theory and practice behind is explained in Nicholas J, Higham's paper

The Accuracy of Floating Point Summation

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7AECC0D6458288CD6E4488AD63A33D5D?doi=10.1.1.43.3535&rep=rep1&type=pdf

The fix is to use a smarter algorithm like the Kahan Summation Algorithm

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

And the Higham paper has some alternatives too.

This problem illustrates the nature of benchmarking, the first rule of the benchmark is to get the right answer, using realistic data!