Why doesn't a compiler optimize floating-point

2020-05-29 13:12发布

I've often noticed gcc converting multiplications into shifts in the executable. Something similar might happen when multiplying an int and a float. For example, 2 * f, might simply increment the exponent of f by 1, saving some cycles. Do the compilers, perhaps if one requests them to do so (e.g. via -ffast-math), in general, do it?

Are compilers generally smart enough to do this, or do I need to do this myself using the scalb*() or ldexp()/frexp() function family?

8条回答
做个烂人
2楼-- · 2020-05-29 13:25

A previous Stackoverflow question about multiplication by powers of 2. The consensus, and the actual implementations, proved that unfortunately, there is no current way to be more efficient than standard multiplication.

查看更多
Luminary・发光体
3楼-- · 2020-05-29 13:27

It's not about compilers or compiler writers not being smart. It's more like obeying standards and producing all the necessary "side effects" such as Infs, Nans, and denormals.

Also it can be about not producing other side effects that are not called for, such as reading memory. But I do recognize that it can be faster in some circumstances.

查看更多
啃猪蹄的小仙女
4楼-- · 2020-05-29 13:35

For example, 2 * f, might simply increment the exponent of f by 1, saving some cycles.

This simply isn't true.

First you have too many corner cases such as zero, infinity, Nan, and denormals. Then you have the performance issue.

The misunderstanding is that incrementing the exponent is not faster than doing a multiplication.

If you look at the hardware instructions, there is no direct way to increment the exponent. So what you need to do instead is:

  1. Bitwise convert into integer.
  2. Increment the exponent.
  3. Bitwise convert back to floating-point.

There is generally a medium to large latency for moving data between the integer and floating-point execution units. So in the end, this "optimization" becomes much worse than a simple floating-point multiply.

So the reason why the compiler doesn't do this "optimization" is because it isn't any faster.

查看更多
对你真心纯属浪费
5楼-- · 2020-05-29 13:35

If you think that multiplying by two means increasing the exponent by 1, think again. Here are the possible cases for IEEE 754 floating-point arithmetic:

Case 1: Infinity and NaN stay unchanged.

Case 2: Floating-point numbers with the largest possible exponent are changed to Infinity by increasing the exponent and setting the mantissa except for the sign bit to zero.

Case 3: Normalised floating-point numbers with exponent less than the maximum possible exponent have their exponent increased by one. Yippee!!!

Case 4: Denormalised floating-point numbers with the highest mantissa bit set have their exponent increased by one, turning them into normalised numbers.

Case 5: Denormalised floating-point numbers with the highest mantissa bit cleared, including +0 and -0, have their mantissa shifted to the left by one bit position, leaving the exponent unchanged.

I very much doubt that a compiler producing integer code handling all these cases correctly will be anywhere as fast as the floating-point built into the processor. And it's only suitable for multiplication by 2.0. For multiplication by 4.0 or 0.5, a whole new set of rules applies. And for the case of multiplication by 2.0, you might try to replace x * 2.0 with x + x, and many compilers do this. That is they do it, because a processor might be able for example to do one addition and one multiplication at the same time, but not one of each kind. So sometimes you would prefer x * 2.0, and sometimes x + x, depending on what other operations need doing at the same time.

查看更多
仙女界的扛把子
6楼-- · 2020-05-29 13:36

On modern CPUs, multiplication typically has one-per-cycle throughput and low latency. If the value is already in a floating point register, there's no way you'll beat that by juggling it around to do integer arithmetic on the representation. If it's in memory to begin with, and if you're assuming neither the current value nor the correct result would be zero, denormal, nan, or infinity, then it might be faster to perform something like

addl $0x100000, 4(%eax)   # x86 asm example

to multiply by two; the only time I could see this being beneficial is if you're operating on a whole array of floating-point data that's bounded away from zero and infinity, and scaling by a power of two is the only operation you'll be performing (so you don't have any existing reason to be loading the data into floating point registers).

查看更多
ら.Afraid
7楼-- · 2020-05-29 13:37

Common floating-point formats, particularly IEEE 754, do not store the exponent as a simple integer, and treating it as an integer will not produce correct results.

In 32-bit float or 64-bit double, the exponent field is 8 or 11 bits, respectively. The exponent codes 1 to 254 (in float) or 1 to 2046 (in double) do act like integers: If you add one to one of these values and the result is one of these values, then the represented value doubles. However, adding one fails in these situations:

  • The initial value is 0 or subnormal. In this case, the exponent field starts at zero, and adding one to it adds 2-126 (in float) or 2-1022 (in double) to the number; it does not double the number.
  • The initial value exceeds 2127 (in float) or 21023 (in double). In this case, the exponent field starts at 254 or 2046, and adding one to it changes the number to a NaN; it does not double the number.
  • The initial value is infinity or a NaN. In this case, the exponent field starts at 255 or 2047, and adding one to it changes it to zero (and is likely to overflow into the sign bit). The result is zero or a subnormal but should be infinity or a NaN, respectively.

(The above is for positive signs. The situation is symmetric with negative signs.)

As others have noted, some processors do not have facilities for manipulating the bits of floating-point values quickly. Even on those that do, the exponent field is not isolated from the other bits, so you typically cannot add one to it without overflowing into the sign bit in the last case above.

Although some applications can tolerate shortcuts such as neglecting subnormals or NaNs or even infinities, it is rare that applications can ignore zero. Since adding one to the exponent fails to handle zero properly, it is not usable.

查看更多
登录 后发表回答