This is a follow-up to my original post. But I'll repeat it for clarity:
As per DICOM standard, a type of floating point can be stored using a Value Representation of Decimal String. See Table 6.2-1. DICOM Value Representations:
Decimal String: A string of characters representing either a fixed point number or a floating point number. A fixed point number shall contain only the characters 0-9 with an optional leading "+" or "-" and an optional "." to mark the decimal point. A floating point number shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to indicate the start of the exponent. Decimal Strings may be padded with leading or trailing spaces. Embedded spaces are not allowed.
"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default Character Repertoire. 16 bytes maximum
The standard is saying that the textual representation is fixed point vs. floating point. The standard only refers to how the values are represented within in the DICOM data set itself. As such there is not requirement to load a fixed point textual representation into a fixed-point variable.
So now that this is clear that DICOM standard implicitely recommend double
(IEEE 754-1985) for representing a Value Representation
of type Decimal String
(maximum of 16 significant digits). My question is how do I use the standard C I/O library to convert back this binary representation from memory into ASCII onto this limited sized string ?
From random source on internet, this is non-trivial, but a generally accepted solution is either:
printf("%1.16e\n", d); // Round-trippable double, always with an exponent
or
printf("%.17g\n", d); // Round-trippable double, shortest possible
Of course both expression are invalid in my case since they can produce output much longer than my limited maximum of 16 bytes. So what is the solution to minimizing the loss in precision when writing out an arbitrary double value to a limited 16 bytes string ?
Edit: if this is not clear, I am required to follow the standard. I cannot use hex/uuencode encoding.
Edit 2: I am running the comparison using travis-ci see: here
So far the suggested codes are:
Results I see over here are:
compute1.c
leads to a total sum error of:0.0095729050923877828
compute2.c
leads to a total sum error of:0.21764383725715469
compute3.c
leads to a total sum error of:4.050031792674619
compute4.c
leads to a total sum error of:0.001287056579548422
So compute4.c
leads to the best possible precision (0.001287056579548422 < 4.050031792674619), but triple (x3) the overall execution time (only tested in debug mode using time
command).
I think your best option is to use printf("%.17g\n", d); to generate an initial answer and then trim it. The simplest way to trim it is to drop digits from the end of the mantissa until it fits. This actually works very well but will not minimize the error because you are truncating instead of rounding to nearest.
A better solution would be to examine the digits to be removed, treating them as an n-digit number between 0.0 and 1.0, so '49' would be 0.49. If their value is less than 0.5 then just remove them. If their value is greater than 0.50 then increment the printed value in its decimal form. That is, add one to the last digit, with wrap-around and carry as needed. Any trailing zeroes that are created should be trimmed.
The only time this becomes a problem is if the carry propagates all the way to the first digit and overflows it from 9 to zero. This might be impossible, but I don't know for sure. In this case (+9.99999e17) the answer would be +1e18, so as long as you have tests for that case you should be fine.
So, print the number, split it into sign/mantissa strings and an exponent integer, and string manipulate them to get your result.
It is trickier than first thought.
Given the various corner cases, it seems best to try at a high precision and then work down as needed.
Any negative number prints the same as a positive number with 1 less precision due to the
'-'
.'+'
sign not needed at the beginning of the string nor after the'e'
.'.'
not needed.Dangerous to use anything other than
sprintf()
to do the mathematical part given so many corner cases. Given various rounding modes,FLT_EVAL_METHOD
, etc., leave the heavy coding to well established functions.When an attempt is too long by more than 1 character, iterations can be saved. E.g. If an attempt, with precision 14, resulted with a width of 20, no need to try precision 13 and 12, just go to 11.
Scaling of the exponent due to the removal of the
'.'
, must be done aftersprintf()
to 1) avoid injecting computational error 2) decrementing adouble
to below its minimum exponent.Maximum relative error is less than 1 part in 2,000,000,000 as with
-1.00000000049999e-200
. Average relative error about 1 part in 50,000,000,000.14 digit precision, the highest, occurs with numbers like
12345678901234e1
so start with 16-2 digits.Test code
Output
For finite floating point values the
printf()
format specifier"%e"
well matches"A floating point number shall be ... with an "E" or "e" to indicate the start of the exponent"
The sign is present with negative numbers and likely
-0.0
. The exponent is at least 2 digits.If we assume
DBL_MAX < 1e1000
, (safe for IEEE 754-1985 double), then the below works in all cases: 1 optional sign, 1 lead digit,'.'
, 8 digits,'e'
, sign, up to 3 digits.(Note: the "16 bytes maximum" does not seem to refer to C string null character termination. Adjust by 1 if needed.)
But this reserves room for the optional sign and 2 to 3 exponent digits.
The trick is the boundary, due to rounding, of when a number uses 2 or uses 3 exponent digits is fuzzy. Even testing for negative numbers, the
-0.0
is an issue.[Edit] Also needed test for very small numbers.
Candidate:
Additional concerns:
Some compilers print at least 3 exponent digits.
The maximum number of decimal significant digits for
IEEE 754-1985 double
needed varies on definition of need, but likely about 15-17. Printf width specifier to maintain precision of floating-point valueCandidate 2: One time test for too long an output
Printing in decimal cannot work because for some numbers a 17 digit mantissa is needed which uses up all of your space without printing the exponent. To be more precise, printing a double in decimal sometimes requires more than 16 characters to guarantee accurate round-tripping.
Instead you should print the underlying binary representation using hexadecimal. This will use exactly 16 bytes, assuming that a null-terminator isn't needed.
If you want to print the results using fewer than 16 bytes then you can basically uuencode it. That is, use more than 16 digits so that you can squeeze more bits into each digit. If you use 64 different characters (six bits) then a 64-bit double can be printed in eleven characters. Not very readable, but tradeoffs must be made.
C library formatter has no direct format for your requirement. At a simple level, if you can accept the waste of characters of the standard
%g
format (e20
is writtene+020
: 2 chars wasted), you can:%.17g
formatCode could look like:
If you really try to be optimal (meaning write e30 instead of e+030), you could try to use %1.16e format and post-process the output. Rationale (for positive numbers):
%1.16e
format allows you to separate the mantissa and the exponent (base 10)0
and fill with rounded mantissae
format with minimal size for the exponent part and fill with the rounded mantissaCorner cases:
-
and add the display for the opposite number and size-1>=5
, increase preceding number and iterate if it was a9
. Process9.9999999999...
as a special case rounded to 10Possible code: