I have medium size C99 program which uses long double
type (80bit) for floating-point computation. I want to improve precision with new GCC 4.6 extension __float128
. As I get, it is a software-emulated 128-bit precision math.
How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision?
What need I change? Compiler flags, sources?
My program have reading of full precision values with strtod
, doing a lot of different operations on them (like +-*/ sin, cos, exp and other from <math.h>
) and printf
-ing of them.
PS: despite that float128 is declared only for Fortran (REAL*16), the libquadmath is written in C and it uses float128. I'm unsure will GCC convert operations on float128 to runtime library or not and I'm unsure how to migrate from long double to __float128 in my sources.
PPS: There is a documentation on "C" language gcc mode: http://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html
"GNU C compiler supports ... 128 bit (TFmode) floating types. Support for additional types includes the arithmetic operators: add, subtract, multiply, divide; unary arithmetic operators; relational operators; equality operators ... __float128 types are supported on i386, x86_64"
How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision? What need I change? Compiler flags, sources?
You need recent software, GCC version with support of __float128
type (4.6 and newer) and libquadmath (supported only on x86 and x86_64 targets; in IA64 and HPPA with newer GCC). You should add linker flag -lquadmath
(the cannot find -lquadmath'
will show that you have no libquadmath installed)
- Add
#include <quadmath.h>
header to have macro and function definitions.
- You should modify all
long double
variable definitions to __float128
.
- Complex variables may be changed to
__complex128
type (quadmath.h
) or directly with typedef _Complex float __attribute__((mode(TC))) _Complex128;
- All simple arithmetic operations are automatically handled by GCC (converted to calls of helper functions like
__*tf3()
).
- If you use any macro like
LDBL_*
, replace them with FLT128_*
(full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)
- If you need some specific constants like pi (
M_PI
) or e (M_E
) with quadruple precision, use predefined constants with q
suffix (M_*q
), like M_PIq
and M_Eq
(full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)
- User-defined constants may be written with
Q
suffix, like 1.3000011111111Q
- All math function calls should be replaced with
*q
versions, like sqrtq()
, sinq()
(full list http://gcc.gnu.org/onlinedocs/libquadmath/Math-Library-Routines.html#Math-Library-Routines)
- Reading quad-float from string should be done with
__float128 strtoflt128 (const char *s, char **sp)
- http://gcc.gnu.org/onlinedocs/libquadmath/strtoflt128.html#strtoflt128 (Warning, in older libquadmaths there may be some bugs in strtoflt128, do a double check)
- Printing the
__float128
is done with help of quadmath_snprintf
function. On linux distributions with recent glibc the function will be automagically registered by libquadmath to handle Q
(may be also q
) length modifier of a, A, e, E, f, F, g, G
conversion specifiers in all printf
s/sprintf
s, like it did L
for long doubles. Example: printf ("%Qe", 1.2Q)
, http://gcc.gnu.org/onlinedocs/libquadmath/quadmath_005fsnprintf.html#quadmath_005fsnprintf
You should also know, that since 4.6 Gfortran will use __float128
type for DOUBLE PRECISION, if the option -fdefault-real-8
was given and there were no option -fdefault-double-8
. This may be problem, since 128 long double is much slower than standard long double on many platforms due to software computation. (Thanks to post by glennglockwood http://glennklockwood.blogspot.com/2014/02/linux-perf-libquadmath-and-gfortrans.html)