Denormalized Numbers - IEEE 754 Floating Point

2019-01-11 18:26发布

问题:

So I'm trying to learn more about Denormalized numbers as defined in the IEEE 754 standard for Floating Point numbers. I've already read several articles thanks to Google search results, and I've gone through several StackOverFlow posts. However I still have some questions unanswered.

First off, just to review my understanding of what a Denormalized float is:

Numbers which have fewer bits of precision, and are smaller (in magnitude) than normalized numbers

Essentially, a denormalized float has the ability to represent the SMALLEST (in magnitude) number that is possible to be represented with any floating point value.

Does that sound correct? Anything more to it than that?

I've read that:

using denormalized numbers comes with a performance cost on many platforms

Any comments on this?

I've also read in one of the articles that

one should "avoid overlap between normalized and denormalized numbers"

Any comments on this?

In some presentations of the IEEE standard, when floating point ranges are presented the denormalized values are excluded and the tables are labeled as an "effective range", almost as if the presenter is thinking "We know that denormalized numbers CAN represent the smallest possible floating point values, but because of certain disadvantages of denormalized numbers, we choose to exclude them from ranges that will better fit common use scenarios" -- As if denormalized numbers are not commonly used.

I guess I just keep getting the impression that using denormalized numbers turns out to not be a good thing in most cases?

If I had to answer that question on my own I would want to think that:

Using denormalized numbers is good because you can represent the smallest (in magnitude) numbers possible -- As long as precision is not important, and you do not mix them up with normalized numbers, AND the resulting performance of the application fits within requirements.

Using denormalized numbers is a bad thing because most applications do not require representations so small -- The precision loss is detrimental, and you can shoot yourself in the foot too easily by mixing them up with normalized numbers, AND the peformance is not worth the cost in most cases.

Any comments on these two answers? What else might I be missing or not understand about denormalized numbers?

回答1:

Essentially, a denormalized float has the ability to represent the SMALLEST (in magnitude) number that is possible to be represented with any floating point value.

That is correct.

using denormalized numbers comes with a performance cost on many platforms

The penalty is different on different processors, but it can be up to 2 orders of magnitude. The reason? The same as for this advice:

one should "avoid overlap between normalized and denormalized numbers"

Here's the key: denormals are a fixed-point "micro-format" within the IEEE-754 floating-point format. In normal numbers, the exponent indicates the position of the binary point. Denormal numbers contain the last 52 bits in the fixed-point notation with an exponent of 2-1074 for doubles.

So, denormals are slow because they require special handling. In practice, they occur very rarely, and chip makers don't like to spend too many valuable resources on rare cases.

Mixing denormals with normals is slow because then you're mixing formats and you have the additional step of converting between the two.

I guess I just keep getting the impression that using denormalized numbers turns out to not be a good thing in most cases?

Denormals were created for one primary purpose: gradual underflow. It's a way to keep the relative difference between tiny numbers small. If you go straight from the smallest normal number to zero (abrupt underflow), the relative change is infinite. If you go to denormals on underflow, the relative change is still not fully accurate, but at least more reasonable. And that difference shows up in calculations.

To put it a different way. Floating-point numbers are not distributed uniformly. There are always the same amount of numbers between successive powers of two: 252 (for double precision). So without denormals, you always end up with a gap between 0 and the smallest floating-point number that is 252 times the size of the difference between the smallest two numbers. Denormals fill this gap uniformly.

As an example about the effects of abrupt vs. gradual underflow, look at the mathematically equivalent x == y and x - y == 0. If x and y are tiny but different and you use abrupt underflow, then if their difference is less than the minimum cutoff value, their difference will be zero, and so the equivalence is violated.

With gradual underflow, the difference between two tiny but different normal numbers gets to be a denormal, which is still not zero. The equivalence is preserved.

So, using denormals on purpose is not advised, because they were designed only as a backup mechanism in exceptional cases.