Mantissa Normalization of C# double

2019-05-02 00:21发布

EDIT: Got it to work now, while normalizing the mantiss it is important to first set the implicit bit, when decoding the implicit bit then does not have to be added. I left the marked answer as correct, as the information there really helped.

I'm currently implementing an encoding (Distinguished encoding rules) and have a slight problem encoding double values.

So, I can get out the sign, exponent and mantissa from a double in c# by using:

 // get parts
 double value = 10.0;
 long bits = BitConverter.DoubleToInt64Bits(value);
 // Note that the shift is sign-extended, hence the test against -1 not 1
 bool negative = (bits < 0);
 int exponent = (int)((bits >> 52) & 0x7ffL);
 long mantissa = bits & 0xfffffffffffffL;

(using code from here). These values can be encoded and a simple reversal of the process will get me back the original double.

However, the DER encoding rules specify that the mantissa should be normalized:

In the Canonical Encoding Rules and the Distinguished Encoding Rules normalization is specified and the mantissa (unless it is 0) needs to be repeatedly shifted until the least significant bit is a 1.

(see here in section 8.5.6.5).

Doing this by hand using:

 while ((mantissa & 1) == 0)
 {
     mantissa >>= 1;
     exponent++;
 }

will not work, and gives me strange values. (Even when using the whole function Jon Skeet posted in the aforementioned link).

I seem to be missing something here, it would be easiest if I first could normalize the mantiassa of the double and the get the "bits". However, I also can't really see why the normalization by hand won't work correctly.

Thanks for any help,

Danny

EDIT: Actual working problem showing my issue with mantiss normalization:

 static void Main(string[] args)
    {
        Console.WriteLine(CalculateDouble(GetBits(55.5, false))); 
        Console.WriteLine(CalculateDouble(GetBits(55.5, true)));
        Console.ReadLine();
    }

    private static double CalculateDouble(Tuple<bool, int, long> bits)
    {
        double result = 0;
        bool isNegative = bits.Item1;
        int exponent = bits.Item2;
        long significand = bits.Item3;

        if (exponent == 2047 && significand != 0)
        {
            // special case
        }
        else if (exponent == 2047 && significand == 0)
        {
            result = isNegative ? double.NegativeInfinity : double.PositiveInfinity;
        }
        else if (exponent == 0)
        {
            // special case, subnormal numbers
        }
        else
        {
            /* old code, wont work double actualSignificand = significand*Math.Pow(2,                   
               -52) + 1; */
            double actualSignificand = significand*Math.Pow(2, -52);
            int actualExponent = exponent - 1023;
            if (isNegative)
            {
                result = actualSignificand*Math.Pow(2, actualExponent);
            }
            else 
            {
                result = -actualSignificand*Math.Pow(2, actualExponent);**strong text**
            }
        }
        return result;

    }


    private static Tuple<bool, int, long> GetBits(double d, bool normalizeSignificand)
    {
        // Translate the double into sign, exponent and mantissa.
        long bits = BitConverter.DoubleToInt64Bits(d);
        // Note that the shift is sign-extended, hence the test against -1 not 1
        bool negative = (bits < 0);
        int exponent = (int)((bits >> 52) & 0x7ffL);
        long significand = bits & 0xfffffffffffffL;

        if (significand == 0)
        {
            return Tuple.Create<bool, int, long>(false, 0, 0);
        }
        // fix: add implicit bit before normalization
        if (exponent != 0)
        {
            significand = significand | (1L << 52);
        }
        if (normalizeSignificand)
        {
            //* Normalize */
            while ((significand & 1) == 0)
            {
                /*  i.e., Mantissa is even */
                significand >>= 1;
                exponent++;
            }
        }
        return Tuple.Create(negative, exponent, significand);

    }
    Output:
    55.5
    2.25179981368527E+15

1条回答
虎瘦雄心在
2楼-- · 2019-05-02 01:20

When you use BitConverter.DoubleToInt64Bits, it gives you the double value already encoded in IEEE 754 format. This means the significand is encoded with an implicit leading bit. (“Significand” is the preferred term for the fraction portion of a floating-point value and is used in IEEE 754. A significand is linear. A mantissa is logarithmic. “Mantissa” stems from the days when people had to use logarithms and paper and tables of functions to do crude calculations.) To recover the unencoded significand, you would have to restore the implicit bit.

That is not hard. Once you have separated the sign bit, the encoded exponent (as an integer), and the encoded significand (as an integer), then, for 64-bit binary floating-point:

  • If the encoded exponent is its maximum (2047) and the encoded significand is non-zero, the value is a NaN. There is additional information in the significand about whether the NaN is signaling or not and other user- or implementation-defined information.
  • If the encoded exponent is its maximum and the encoded significand is zero, the value is an infinity (+ or – according to the sign).
  • If the encoded exponent is zero, the implicit bit is zero, the actual significand is the encoded significand multiplied by 2–52, and the actual exponent is one minus the bias (1023) (so –1022).
  • Otherwise, the implicit bit is one, the actual significand is the encoded significand first multiplied by 2–52 and then added to one, and the actual exponent is the encoded exponent minus the bias (1023).

(If you want to work with integers and not have fractions for the significand, you can omit the multiplications by 2–52 and add –52 to the exponent instead. In the last case, the significand is added to 252 instead of to one.)

There is an alternative method that avoids BitConverter and the IEEE-754 encoding. If you can call the frexp routine from C#, it will return the fraction and exponent mathematically instead of as encodings. First, handle zeroes, infinities, and NaNs separately. Then use:

int exponent;
double fraction = frexp(value, &exponent);

This sets fraction to a value with magnitude in [½, 1) and exponent such that fraction•2exponent equals value. (Note that fraction still has the sign; you might want to separate that and use the absolute value.)

At this point, you can scale fraction as desired (and adjust exponent accordingly). To scale it so that it is an odd integer, you could multiply it by two repeatedly until it has no fractional part.

查看更多
登录 后发表回答