Force gfortran to stop program at first NaN

2019-01-22 08:57发布

问题:

To debug my application (fortran 90) I want to turn all NaNs to signalling NaN.

With default settings my program works without any signals and just outputs NaN data in file. I want find the point, where NaN is generated. If I can recompile program with signalling NaN, I will get an SIGFPE signal at first point where first wrong floating operation reside.

回答1:

The flag you're looking for is -ffpe-trap=invalid; I usually add ,zero,overflow to check for related floating point exceptions.

program nantest
    real :: a, b, c

    a = 1.
    b = 2.

    c = a/b
    print *, c,a,b

    a = 0.
    b = 0.

    c = a/b
    print *, c,a,b

    a = 2.
    b = 1.

    c = a/b
    print *,c,a,b
end program nantest

Then compiling it and running it in a debugger gives:

$ gfortran -o nantest nantest.f90 -ffpe-trap=invalid,zero,overflow -g -static
$ gdb nantest
[...]
(gdb) run
Starting program: /scratch/ljdursi/Testing/fortran/nantest 
  0.50000000       1.0000000       2.0000000    

Program received signal SIGFPE, Arithmetic exception.
0x0000000000400384 in nantest () at nantest.f90:13
13          c = a/b
Current language:  auto; currently fortran

With the intel fortran compiler (ifort), using the option -fpe0 will do the same thing.

It's a little tricker with C/C++ code; we have to actually insert a call to feenableexcept(), which enables floating point exceptions, and is defined in fenv.h;

#include <stdio.h>
#include <fenv.h>

int main(int argc, char **argv) {  
    float a, b, c;
    feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);

    a = 1.;
    b = 2.;

    c = a/b;
    printf("%f %f %f\n", a, b, c);

    a = 0.;
    b = 0.;

    c = a/b;
    printf("%f %f %f\n", a, b, c);

    a = 2.;
    b = 1.;

    c = a/b;
    printf("%f %f %f\n", a, b, c);

    return 0;
}

but the effect is the same:

$ gcc -o nantest nantest.c -lm -g
$ gdb ./nantest
[...]
(gdb) run
Starting program: /scratch/s/scinet/ljdursi/Testing/exception/nantest  
1.000000 2.000000 0.500000

Program received signal SIGFPE, Arithmetic exception.  
0x00000000004005d0 in main (argc=1, argv=0x7fffffffe4b8) at nantest.c:17  
17        c = a/b;  

either way, you have a much better handle on where the errors are occuring.