What causes Python's float_repr_style to use l

2020-03-03 07:19发布

问题:

On nearly every system, Python can give you human-readable, short representation of a floating point, not the 17 digit machine-precision:

Python 3.3.0 (default, Dec 20 2014, 13:28:01) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1
0.1
>>> import sys; sys.float_repr_style
'short'

On an ARM926EJ-S, you don't get the short representation:

Python 3.3.0 (default, Jun  3 2014, 12:11:19) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1
0.10000000000000001
>>> import sys; sys.float_repr_style
'legacy'

Python 2.7 apparently added this short representation to repr(), for most systems:

Conversions between floating-point numbers and strings are now correctly rounded on most platforms. These conversions occur in many different places: str() on floats and complex numbers; the float and complexconstructors; numeric formatting; serializing and deserializing floats and complex numbers using the marshal, pickle and json modules; parsing of float and imaginary literals in Python code; and Decimal-to-float conversion.

Related to this, the repr() of a floating-point number x now returns a result based on the shortest decimal string that’s guaranteed to round back to x under correct rounding (with round-half-to-even rounding mode). Previously it gave a string based on rounding x to 17 decimal digits.

The rounding library responsible for this improvement works on Windows and on Unix platforms using the gcc, icc, or suncc compilers. There may be a small number of platforms where correct operation of this code cannot be guaranteed, so the code is not used on such systems. You can find out which code is being used by checking sys.float_repr_style, which will be short if the new code is in use and legacy if it isn’t.

Implemented by Eric Smith and Mark Dickinson, using David Gay’s dtoa.c library; issue 7117.

They say some platforms can't guarantee correct operation (of dtoa.c I assume), but don't say which platform limitation are the ones that cause this.

What is it about the ARM926EJ-S that means the short float repr() can't be used?

回答1:

Short answer: it's likely to be not a limitation of the platform, but a limitation of Python's build machinery: it doesn't have a universal way to set 53-bit precision for floating-point computations.

For more detail, take a look at the Include/pyport.h file in the Python source distribution. Here's an excerpt:

/* If we can't guarantee 53-bit precision, don't use the code
   in Python/dtoa.c, but fall back to standard code.  This
   means that repr of a float will be long (17 sig digits).

   Realistically, there are two things that could go wrong:

   (1) doubles aren't IEEE 754 doubles, or
   (2) we're on x86 with the rounding precision set to 64-bits
       (extended precision), and we don't know how to change
       the rounding precision.
 */

#if !defined(DOUBLE_IS_LITTLE_ENDIAN_IEEE754) && \
    !defined(DOUBLE_IS_BIG_ENDIAN_IEEE754) && \
    !defined(DOUBLE_IS_ARM_MIXED_ENDIAN_IEEE754)
#define PY_NO_SHORT_FLOAT_REPR
#endif

/* double rounding is symptomatic of use of extended precision on x86.  If
   we're seeing double rounding, and we don't have any mechanism available for
   changing the FPU rounding precision, then don't use Python/dtoa.c. */
#if defined(X87_DOUBLE_ROUNDING) && !defined(HAVE_PY_SET_53BIT_PRECISION)
#define PY_NO_SHORT_FLOAT_REPR
#endif

Essentially, there are two things that can go wrong. One is that the Python configuration fails to identify the floating-point format of a C double. That format is almost always IEEE 754 binary64, but sometimes the config script fails to figure that out. That's the first #if preprocessor check in the snippet above. Look at the pyconfig.h file generated at compile time, and see if at least one of the DOUBLE_IS_... macros is #defined. Alternatively, try this at a Python prompt:

>>> float.__getformat__('double')
'IEEE, little-endian'

If you see something like the above, this part should be okay. If you see something like 'unknown', then Python hasn't managed to identify the floating-point format.

The second thing that can go wrong is that we do have IEEE 754 binary64 format doubles, but Python's build machinery can't figure out how to ensure 53-bit precision for floating-point computations for this platform. The dtoa.c source requires that we're able to do all floating-point operations (whether implemented in hardware or software) at a precision of 53 bits. That's particularly a problem on Intel processors that are using the x87 floating-point unit for double-precision computations (as opposed to the newer SSE2 instructions): the default precision of the x87 is 64-bits, and using it for double-precision computations with that default precision setting leads to double rounding, which breaks the dtoa.c assumptions. So at config time, the build machinery runs a check to see (1) whether double rounding is a potential problem, and (2) if so, whether there's a way to put the FPU into 53-bit precision. So now you want to look at pyconfig.h for the X87_DOUBLE_ROUNDING and HAVE_PY_SET_53BIT_PRECISION macros.

So it could be either of the above. If I had to guess, I'd guess that on that platform, double rounding is being detected as a problem, and it's not known how to fix it. The solution in that case is to adapt pyport.h to define the _Py_SET_53BIT_PRECISION_* macros in whatever platform-specific way works to get that 53-bit precision mode, and then to define HAVE_PY_SET_53BIT_PRECISION.