C handle signal SIGFPE and continue execution

2019-06-23 20:09发布

问题:

I am trying to handle a SIGFPE signal but my program just crashes or runs forever. I HAVE to use signal() and not the other ones like sigaction().

So in my code I have:

#include <stdio.h>
#include <signal.h>

void handler(int signum)
{
    // Do stuff here then return to execution below
}

int main()
{
    signal(SIGFPE, handler);

    int i, j;
    for(i = 0; i < 10; i++) 
    {
        // Call signal handler for SIGFPE
        j = i / 0;
    }

    printf("After for loop");

    return 0;
}

Basically, I want to go into the handler every time there is a division by 0. It should do whatever it needs to inside the handler() function then continue the next iteration of the loop.

This should also work for other signals that need to be handled. Any help would be appreciated.

回答1:

If you have to use signal to handle FPE or any other signal that you cause directly by invoking the CPU nonsense that causes it, it is only defined what happens if you either exit the program from the signal handler or use longjmp to get out.

Also note the exact placement of the restore functions, at the end of the computation branch but at the start of the handle branch.

Unfortunately, you can't use signal() like this at all; the second invocation causes the code to fall down. You must use sigaction if you intend to handle the signal more than once.

#include <stdio.h>
#include <signal.h>
#include <setjmp.h>
#include <string.h>

jmp_buf fpe;

void handler(int signum)
{
    // Do stuff here then return to execution below
    longjmp(fpe, 1);
}

int main()
{
    volatile int i, j;
    for(i = 0; i < 10; i++) 
    {
        // Call signal handler for SIGFPE
        struct sigaction act;
        struct sigaction oldact;
        memset(&act, 0, sizeof(act));
        act.sa_handler = handler;
        act.sa_flags = SA_NODEFER | SA_NOMASK;
        sigaction(SIGFPE, &act, &oldact);

        if (0 == setjmp(fpe))
        {
            j = i / 0;
            sigaction(SIGFPE, &oldact, &act);
        } else {
            sigaction(SIGFPE, &oldact, &act);
            /* handle SIGFPE */
        }
    }

    printf("After for loop");

    return 0;
}


回答2:

Caveat: Sorry to rain on the parade, but you really don't want to do this.

It is perfectly valid to trap [externally generated] signals like SIGINT, SIGTERM, SIGHUP etc. to allow graceful cleanup and termination of a program that may have files open that are partially written to.

However, internally generated signals, such as SIGILL, SIGBUS, SIGSEGV and SIGFPE are very hard to recover from meaningfully. The first three are bugs--pure and simple. And, IMO, the SIGFPE is also a hard bug as well.

After such a signal, your program is in an unsafe and indeterminate state. Even trapping the signal and doing longjmp/siglongjmp doesn't fix this.

And, there is no way to tell exactly how bad the damage is. Or, how bad the damage will become if the program tries to proceed.

If you get SIGFPE, was it for a floating point calculation [which you might be able to smooth over]. Or, was it for integer divide-by-zero? What calculation was being done? And, where? You don't know.

Trying to continue can sometimes cause 10x the damage because now the program is out of control. After recovery, the program may be okay, but it may not be. So, the reliability of the program after the event, can not be determined with any degree of certainty.

What were the events (i.e.) calculations that led up to the SIGFPE? Maybe, it's not merely a single divide, but the chain of calculations that led up to the value being zero. Where did these values go? Will these now suspect values be used by code after the recovery operation has taken place?

For example, the program might overwrite the wrong file because the failed calculation was somehow involved in selecting the file descriptor that a caller is going to use.

Or, you leak memory. Or, corrupt the heap. Or, was the error within the heap allocation code itself?

Consider the following function:

void
myfunc(char *file)
{
    int fd;

    fd = open(file,O_WRONLY);

    while (1) {
        // do stuff ...

        // write to the file
        write(fd,buf,len);

        // do more stuff ...

        // generate SIGFPE ...
        x = y / z;
    }

    close(fd);
}

Even with a signal handler that does siglongjmp, the file that myfunc was writing to is now corrupted/truncated. And, the file descriptor won't be closed.

Or, what if myfunc was reading from the file and saving the data to some array. That array is only partially filled. Now, you get SIGFPE. This is intercepted by the signal handler which does siglongjmp.

One of the callers of myfunc does the sigsetjmp to "catch" this. But, what can it do? The caller has no idea how bad things are. It might assume that the buffer myfunc was reading into is fully formed and write it out to a different file. That other file has now become corrupted.


UPDATE:

Oops, forgot to mention undefined behavior ...

Normally, we associate UB, such as writing past the end of an array, with a segfault [SIGSEGV]. But, what if it causes SIGFPE instead?

It's no longer just a "bad calculation" -- we're trapping [and ignoring] UB at the earliest detection point. If we do recovery, the next usage could be worse.

Here's an example:

// assume these are ordered in memory as if they were part of the same struct:
int x[10];
int y;
int z;

void
myfunc(void)
{

    // initialize
    y = 23;
    z = 37;

    // do stuff ...

    // generate UB -- we run one past the end of x and zero out y
    for (int i = 0;  i <= 10;  ++i)
        x[i] = 0;

    // do more stuff ...

    // generate SIGFPE ...
    z /= y;

    // do stuff ...

    // do something _really_ bad with y that causes a segfault or _worse_
    // sends a space rocket off-course ...
}


标签: c signals