可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'd like to know if it is possible/the recommended way to catch the SIGSEGV signal in multithreaded environment. I am particularly interested in handling the SIGSEGV raised by something like *((int *)0) = 0.

Some reading on this topic led me to signal() and sigaction(), which install a signal handler. While neither seem promising in multithreaded environment. I then tried the sigwaitinfo(), receiving the signals in one thread with a prior pthread_sigmask() call that blocks the signal on the others. It worked to the extent upon which the signal SIGSEGV was raised, using raise(), inside a thread or when it was sent to the process by something like kill -SIGSEGV; however, \*((int*)0) = 0 still kills the process. My test program is as follows

void block_signal()
{
        sigset_t set;

        sigemptyset(&set);
        sigaddset(&set, SIGSEGV);
        sigprocmask(SIG_BLOCK, &set, NULL);

        if (pthread_sigmask(SIG_BLOCK, &set, NULL)) {
                fprintf(stderr, "pthread_sigmask failed\n");
                exit(EXIT_FAILURE);
        }
    }

void *buggy_thread(void *param)
{
        char *ptr = NULL;
        block_signal();

        printf("Thread %lu created\n", pthread_self());

        // Sleep for some random time
        { ... }

        printf("About to raise from %lu\n", pthread_self());

        // Raise a SIGSEGV
        *ptr = 0;

        pthread_exit(NULL);
}

void *dispatcher(void *param)
{
        sigset_t set;
        siginfo_t info;
        int sig;

        sigemptyset(&set);
        sigaddset(&set, SIGSEGV);

        for (;;) {
                sig = sigwaitinfo(&set, &info);
                if (sig == -1)
                        fprintf(stderr, "sigwaitinfo failed\n");
                else
                        printf("Received signal SIGSEGV from %u\n", info.si_pid);
        }
}

int main()
{
        int i;
        pthread_t tid;
        pthread_t disp_tid;

        block_signal();

        if (pthread_create(&disp_tid, NULL, dispatcher, NULL)) {
                fprintf(stderr, "Cannot create dispatcher\n");
                exit(EXIT_FAILURE);
        }

        for (i = 0; i < 10; ++i) {
                if (pthread_create(&tid, NULL, buggy_thread, NULL) {
                        fprintf(stderr, "Cannot create thread\n");
                        exit(EXIT_FAILURE);
                }
        }

        pause();
}

Unexpectedly, the program dies with a segmentation fault instead of printing the raiser's thread id.

回答1:

Your code does not call sigaction(2), and I believe it should call it. Read also signal(7) and signal-safety(7). And the signal action (thru sa_sigaction field should do something (machine specific) with its siginfo_t to skip the offending machine instruction, or to mmap the offending address, or call siglongjmp, otherwise when returning from the signal handler you'll get the SIGSEGV again since the offending machine instruction is restarted.

You cannot handle the SIGSEGV in another thread, since synchronous signals (such as SIGSEGV or SIGSYS) are thread specific (see this answer), so what you try to achieve with sigwaitinfo cannot work. In particular SIGSEGV is directed to the offending thread.

回答2:

Signal delivery for SIGSEGV caused by a faulting memory access is to the thread that performed the invalid access. Per POSIX (XSH 2.4.1):

At the time of generation, a determination shall be made whether the signal has been generated for the process or for a specific thread within the process. Signals which are generated by some action attributable to a particular thread, such as a hardware fault, shall be generated for the thread that caused the signal to be generated. Signals that are generated in association with a process ID or process group ID or an asynchronous event, such as terminal activity, shall be generated for the process.

The problematic aspect of trying to handle SIGSEGV in a multi-threaded program is that, while delivery and signal mask are thread-local, the signal disposition (i.e. what handler to call) is process-global. In other words, sigaction sets a signal handler for the whole process, not just the calling thread. This means that multiple threads each trying to setup their own SIGSEGV handlers will clobber each other's settings.

The best solution I can propose is to set a global signal handler for SIGSEGV using sigaction, preferably with SA_SIGINFO so you get additional information about the fault, then have a thread-local variable for a handler for the specific thread. Then, the actual signal handler can be:

_Thread_local void (*thread_local_sigsegv_handler)(int, siginfo_t *, void *);
static void sigsegv_handler(int sig, siginfo_t *si, void *ctx)
{
    thread_local_sigsegv_handler(sig, si, ctx);
}

Note that this makes use of C11 thread-local storage. If you don't have that available, you can fall back to either "GNU C" __thread thread-local storage, or POSIX thread-specific data (using pthread_key_create and pthread_setspecific/pthread_getspecific). Strictly speaking, the latter are not async-signal-safe, so calling them from the signal handler invokes UB if the illegal access took place inside a non-async-signal-safe function in the standard library. However, if it took place in your own code, you can be sure no non-async-signal-safe function was interrupted by the signal handler, and thus these functions have well-defined behavior (well, modulo the fact that your whole program probably already has UB from whatever it did to generate SIGSEGV...).

回答3:

"Why do you want to catch SIGSEGV ? What will you do after having caught it?"

The most common answer would be: quit/abort. But then, what would be the reason to even deliver this signal to a process instead of just arbitrarily terminating it?

The answer is: because signals, including the SIGSEGV, are just exceptions - and it's very important for some applications to f.e. set the hardware outputs to a "safe mode" or make sure that some important data is left in consistent state before terminating the process.

There are generally 2 kinds of segfaults: caused by write or by read operations.

Segfaults caused by read operations are perfectly safe to catch and even to ignore in some cases(1). Failed write operations need more attention and effort to be safely processed (risk of data/memory corruption), but this is also possible (f.e. by avoiding to dynamically allocate the memory after a segfault).

The problem with "critical signals" (which are delivered to a particular thread, like SIGFPE or SIGSEGV) is that normally the program don't "know" what is the context of the signal - that is, which operation or function have triggered the signal.

There are at least few possible ways to get those informations, for example:

Each thread can perform only a single class of small operations - so if it gets a signal, then it's easy to tell what happened -> terminate the thread, verify the processed data, etc. -> terminate safely.
Use C exceptions - there are few ready to use solutions, mine is: libcxc

(1) F.e. the famous problem with ESRCH and pthread_kill() issued for a thread which have already exited on its own :)