Unix signals doubt - on the execution of the below

2019-08-13 22:52发布

问题:

I have this below program

#include <signal.h>
#include <stdio.h>
#include <unistd.h>

int x = 1;

void ouch(int sig) {
    printf("OUCH!  dividing by zero!\n");
    x = 0; 
}

void fpe(int sig) {
    printf("FPE!  I got a signal: %d\n",sig);
    psignal(sig, "psignal");
    x = 1; 
}

int main(void) {
    (void) signal(SIGINT, ouch);
    (void) signal(SIGFPE, fpe);

    while(1)
    {
        printf("Hello World: %d\n",1/x);
        sleep(1);
    }
}

Problem: While executing this program - when I give a SIGINT from the terminal to the program - the ""OUCH! dividing by zero! " is output - as Expected. the next message is the "FPE! I got a signal: 8 psignal: Floating point exception " . and this message goes on and on - doesn't stop. My doubt is after calling the fpe signal handler , I set x to be 1 . I hence expect Hello World should be displayed in the output.

Below is a transcript of the output I am getting :

Hello World: 1
Hello World: 1
^COUCH!  dividing by zero!
FPE!  I got a signal: 8
psignal: Floating point exception
FPE!  I got a signal: 8
psignal: Floating point exception
FPE!  I got a signal: 8
psignal: Floating point exception
^COUCH!  dividing by zero!

.
.
.
.

回答1:

When the signal handler is entered, the program counter (CPU register pointing at the currently executing instruction) is saved where the divide-by-zero occurred. Ignoring the signal restores the PC to exactly the same place, upon which the signal is triggered again (and again, and again).

The value or volatility of 'x' is irrelevant by this point - the zero has been transferred into a CPU register in readiness to perform the divide.

man 2 signal notes that:

According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions. Integer division by zero has undefined result. On some architectures it will generate a SIGFPE signal. (Also dividing the most negative integer by -1 may generate SIGFPE.) Ignoring this signal might lead to an endless loop.

We can see this in gdb if you compile with the debug flag:

simon@diablo:~$ gcc -g -o sigtest sigtest.c 
simon@diablo:~$ gdb sigtest
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...

By default gdb won't pass SIGINT to the process - change this so it sees the first signal:

(gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y

Signal        Stop  Print   Pass to program Description
SIGINT        Yes   Yes Yes     Interrupt

Off we go:

(gdb) run
Starting program: /home/simon/sigtest 
x = 1
Hello World: 1

Now let's interrupt it:

^C
Program received signal SIGINT, Interrupt.
0xb767e17b in nanosleep () from /lib/libc.so.6

and onwards to the divide:

(gdb) cont
Continuing.
OUCH!  dividing by zero!
x = 0

Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30              printf("Hello World: %d\n",1/x);

Check the value of 'x', and continue:

(gdb) print x
$1 = 0
(gdb) cont
Continuing.
FPE!  I got a signal: 8
psignal: Floating point exception

Program received signal SIGFPE, Arithmetic exception.
0x0804853a in main () at sigtest.c:30
30              printf("Hello World: %d\n",1/x);
(gdb) print x
$2 = 1

x is clearly now 1 and we still got a divide-by-zero - what's going on? Let's inspect the underlying assembler:

(gdb) disassemble 
Dump of assembler code for function main:
0x080484ca :    lea    0x4(%esp),%ecx
0x080484ce :    and    $0xfffffff0,%esp
...
0x08048533 :  mov    %eax,%ecx
0x08048535 :  mov    %edx,%eax
0x08048537 :  sar    $0x1f,%edx
0x0804853a :   idiv   %ecx           <<-- address FPE occurred at
0x0804853c :  mov    %eax,0x4(%esp)
0x08048540 :  movl   $0x8048653,(%esp)
0x08048547 :  call   0x8048384 
0x0804854c :  jmp    0x8048503 
End of assembler dump.

One Google search later tells us that IDIV divides the value in the EAX register by the source operand (ECX). You can probably guess the register contents:

(gdb) info registers 
eax            0x1  1
ecx            0x0  0
...


回答2:

You should use volatile int x to ensure that the compiler reloads x from memory each time through the loop. Given that your SIGINT handler works, this probably does not explain your specific problem, but if you try more complicated examples (or crank up the optimization) it will eventually bite you.



回答3:

After handling a signal raised while executing an instruction, the PC may return to either that instruction or to the following one. Which one it does is very CPU + OS specific. In addition, whether integer division by zero raises SIGFPE is also CPU + OS dependant.

At the CPU level, after taking an exception, it makes most sense to return to the offending instruction, after the OS has had the chance to do whatever it needs to (think of page faults/TLB misses), and run that instruction again. (The OS may have had to do some address correction, for instance, ARM CPUs point two instructions after the offending instruction as a testament to their original 3-stage pipeline, while MIPS CPU's point to the jump when taking an exception from an instruction on a jump delay slot).

At the OS level, there are several ways to handle exceptions:

  • Do the necessary handling (swap memory in, update page tables, etc...) and rerun the instruction.
  • Emulate that instruction, advance the PC accordingly and return to the next instruction. This allows for emulation of unimplemented instructions (CPUs without/with incomplete FPUs, LL/SC on MIPSI CPUs, ...), and unsupported alignment (after taking an alignment exception, the OS may decide sending a SIGBUS to the process, or emulating the unsupported access, possibly while logging it).
  • Send a fatal signal to the process. The process may take the role of the OS here in handling the exception, using CPU + OS dependent methods, such as the siginfo method linked by Simonj.

A non-portable method to deal with SIGFPE is calling longjmp() from the signal handler, as in my answer to a similar question on SIGSEGV.

n1318 has more details on the longjmp() from signal handler that you ever wanted to know. Also note that POSIX specifies that longjmp() should work from non-nested signal handlers.