Why does “dtoa.c” contain so much code?

2019-01-17 07:28发布

问题:

I'll be the first to admit that my overall knowledge of low level programming is a bit sparse. I understand many of the core concepts but I do not use them on a regular basis. That being said I was absolutely astounded at how much code was needed for dtoa.c.

For the past couple months I have been working on an ECMAScript implementation in C# and I've been slowing filling in the holes in my engine. Last night I started working on Number.prototype.toString which is described in section 15.7.4.2 of the ECMAScript specification (pdf). In section 9.8.1, NOTE 3 offers a link to dtoa.c but I was looking for a challenge so I waited to view it. The following is what I came up with.

private IDynamic ToString(Engine engine, Args args)
{
    var thisBinding = engine.Context.ThisBinding;
    if (!(thisBinding is NumberObject) && !(thisBinding is NumberPrimitive))
    {
        throw RuntimeError.TypeError("The current 'this' must be a number or a number object.");
    }

    var num = thisBinding.ToNumberPrimitive();

    if (double.IsNaN(num))
    {
        return new StringPrimitive("NaN");
    }
    else if (double.IsPositiveInfinity(num))
    {
        return new StringPrimitive("Infinity");
    }
    else if (double.IsNegativeInfinity(num))
    {
        return new StringPrimitive("-Infinity");
    }

    var radix = !args[0].IsUndefined ? args[0].ToNumberPrimitive().Value : 10D;

    if (radix < 2D || radix > 36D)
    {
        throw RuntimeError.RangeError("The parameter [radix] must be between 2 and 36.");
    }
    else if (radix == 10D)
    {
        return num.ToStringPrimitive();
    }

    var sb = new StringBuilder();
    var isNegative = false;

    if (num < 0D)
    {
        isNegative = true;
        num = -num;
    }

    var integralPart = Math.Truncate(num);
    var decimalPart = (double)((decimal)num.Value - (decimal)integralPart);
    var radixChars = RadixMap.GetArray((int)radix);

    if (integralPart == 0D)
    {
        sb.Append('0');
    }
    else
    {
        var integralTemp = integralPart;
        while (integralTemp > 0)
        {
            sb.Append(radixChars[(int)(integralTemp % radix)]);
            integralTemp = Math.Truncate(integralTemp / radix);
        }
    }

    var count = sb.Length - 1;
    for (int i = 0; i < count; i++)
    {
        var k = count - i;
        var swap = sb[i];
        sb[i] = sb[k];
        sb[k] = swap;
    }

    if (isNegative)
    {
        sb.Insert(0, '-');
    }

    if (decimalPart == 0D)
    {
        return new StringPrimitive(sb.ToString());
    }

    var runningValue = 0D;
    var decimalIndex = 1D;
    var decimalTemp = decimalPart;

    sb.Append('.');
    while (decimalIndex < 100 && decimalPart - runningValue > 1.0e-50)
    {
        var result = decimalTemp * radix;
        var integralResult = Math.Truncate(result);
        runningValue += integralResult / Math.Pow(radix, decimalIndex++);
        decimalTemp = result - integralResult;
        sb.Append(radixChars[(int)integralResult]);
    }

    return new StringPrimitive(sb.ToString());
}

Can anyone with more experience in low level programming explain why dtoa.c has roughly 40 times as much code? I just cannot imagine C# being that much more productive.

回答1:

dtoa.c contains two main functions: dtoa(), which converts a double to string, and strtod(), which converts a string to a double. It also contains a lot of support functions, most of which are for its own implementation of arbitrary-precision arithmetic. dtoa.c's claim to fame is getting these conversions right, and that can only be done, in general, with arbitrary-precision arithmetic. It also has code to round conversions correctly in four different rounding modes.

Your code only tries to implement the equivalent of dtoa(), and since it uses floating-point to do its conversions, will not always get them right. (Update: see my article http://www.exploringbinary.com/quick-and-dirty-floating-point-to-decimal-conversion/ for details.)

(I've written a lot about this on my blog, http://www.exploringbinary.com/ . Six of my last seven articles have been about strtod() conversions alone. Read through them to see how complicated it is to do correctly rounded conversions.)



回答2:

Producing good results for conversions between decimal and binary floating point representations is a rather difficult problem.

The major source of difficulty is that many decimal fractions, even simple ones, cannot be accurately expressed using binary floating point -- for example, 0.5 can (obviously), but 0.1 cannot. And, going the other way (from binary to decimal), you generally don't want the absolutely accurate result (for example, the accurate decimal value of the closest number to 0.1 which can be represented in an IEEE-754-compliant double is actually 0.1000000000000000055511151231257827021181583404541015625) so you normally want some rounding.

So, conversion often involves approximation. Good conversion routines guarantee to produce the closest possible approximation within particular (word size or number of digits) constraints. This is where most of the complexity comes from.

Take a look at the paper cited in comment at the top of the dtoa.c implementation, Clinger's How to Read Floating Point Numbers Accurately, for a flavour of the problem; and perhaps David M. Gay (the author)'s paper, Correctly Rounded Binary-Decimal and Decimal-Binary Conversions.

(Also, more generally: What Every Computer Scientist Should Know About Floating Point Arithmetic.)



回答3:

Based on a quick glance at it, a fair amount of the C version is dealing with multiple platforms and such as it looks like this file is meant to be generically usable across compilers (C & C++), bitnesses, floating point implementations, and platforms; with tons of #define configurability.



回答4:

I think also that the code in dtoa.c might be more efficient (independent of language). For example, it seems to be doing some bit-fiddling, which in the hands of an expert often means speed. I assume it simply uses a less intuitive algorithm for speed reasons.



回答5:

Short answer: because dtoa.c works.

This is exactly the difference between well-debugged product and a NIH prototype.