I'd like to know if it is possible/the recommended way to catch the SIGSEGV
signal in multithreaded environment. I am particularly interested in handling the SIGSEGV
raised by something like *((int *)0) = 0
.
Some reading on this topic led me to signal()
and sigaction()
, which install a signal handler. While neither seem promising in multithreaded environment. I then tried the sigwaitinfo()
, receiving the signals in one thread with a prior pthread_sigmask()
call that blocks the signal on the others. It worked to the extent upon which the signal SIGSEGV
was raised, using raise(), inside a thread or when it was sent to the process by something like kill -SIGSEGV
; however, \*((int*)0) = 0
still kills the process. My test program is as follows
void block_signal()
{
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGSEGV);
sigprocmask(SIG_BLOCK, &set, NULL);
if (pthread_sigmask(SIG_BLOCK, &set, NULL)) {
fprintf(stderr, "pthread_sigmask failed\n");
exit(EXIT_FAILURE);
}
}
void *buggy_thread(void *param)
{
char *ptr = NULL;
block_signal();
printf("Thread %lu created\n", pthread_self());
// Sleep for some random time
{ ... }
printf("About to raise from %lu\n", pthread_self());
// Raise a SIGSEGV
*ptr = 0;
pthread_exit(NULL);
}
void *dispatcher(void *param)
{
sigset_t set;
siginfo_t info;
int sig;
sigemptyset(&set);
sigaddset(&set, SIGSEGV);
for (;;) {
sig = sigwaitinfo(&set, &info);
if (sig == -1)
fprintf(stderr, "sigwaitinfo failed\n");
else
printf("Received signal SIGSEGV from %u\n", info.si_pid);
}
}
int main()
{
int i;
pthread_t tid;
pthread_t disp_tid;
block_signal();
if (pthread_create(&disp_tid, NULL, dispatcher, NULL)) {
fprintf(stderr, "Cannot create dispatcher\n");
exit(EXIT_FAILURE);
}
for (i = 0; i < 10; ++i) {
if (pthread_create(&tid, NULL, buggy_thread, NULL) {
fprintf(stderr, "Cannot create thread\n");
exit(EXIT_FAILURE);
}
}
pause();
}
Unexpectedly, the program dies with a segmentation fault instead of printing the raiser's thread id.
Your code does not call sigaction(2), and I believe it should call it. Read also signal(7) and signal-safety(7). And the signal action (thru
sa_sigaction
field should do something (machine specific) with itssiginfo_t
to skip the offending machine instruction, or tommap
the offending address, or callsiglongjmp
, otherwise when returning from the signal handler you'll get theSIGSEGV
again since the offending machine instruction is restarted.You cannot handle the
SIGSEGV
in another thread, since synchronous signals (such asSIGSEGV
orSIGSYS
) are thread specific (see this answer), so what you try to achieve withsigwaitinfo
cannot work. In particularSIGSEGV
is directed to the offending thread.Read also all about Linux signals.
PS. An example of clever
SIGSEGV
handling is offered by the no-more maintained (in May 2019) Ravenbrook MPS garbage collector library. Notice also the Linux specific and recent userfaultfd(2) and signalfd(2) system calls.Signal delivery for
SIGSEGV
caused by a faulting memory access is to the thread that performed the invalid access. Per POSIX (XSH 2.4.1):The problematic aspect of trying to handle
SIGSEGV
in a multi-threaded program is that, while delivery and signal mask are thread-local, the signal disposition (i.e. what handler to call) is process-global. In other words,sigaction
sets a signal handler for the whole process, not just the calling thread. This means that multiple threads each trying to setup their ownSIGSEGV
handlers will clobber each other's settings.The best solution I can propose is to set a global signal handler for
SIGSEGV
usingsigaction
, preferably withSA_SIGINFO
so you get additional information about the fault, then have a thread-local variable for a handler for the specific thread. Then, the actual signal handler can be:Note that this makes use of C11 thread-local storage. If you don't have that available, you can fall back to either "GNU C"
__thread
thread-local storage, or POSIX thread-specific data (usingpthread_key_create
andpthread_setspecific
/pthread_getspecific
). Strictly speaking, the latter are not async-signal-safe, so calling them from the signal handler invokes UB if the illegal access took place inside a non-async-signal-safe function in the standard library. However, if it took place in your own code, you can be sure no non-async-signal-safe function was interrupted by the signal handler, and thus these functions have well-defined behavior (well, modulo the fact that your whole program probably already has UB from whatever it did to generateSIGSEGV
...)."Why do you want to catch SIGSEGV ? What will you do after having caught it?"
The most common answer would be: quit/abort. But then, what would be the reason to even deliver this signal to a process instead of just arbitrarily terminating it?
The answer is: because signals, including the SIGSEGV, are just exceptions - and it's very important for some applications to f.e. set the hardware outputs to a "safe mode" or make sure that some important data is left in consistent state before terminating the process.
There are generally 2 kinds of segfaults: caused by write or by read operations.
Segfaults caused by read operations are perfectly safe to catch and even to ignore in some cases(1). Failed write operations need more attention and effort to be safely processed (risk of data/memory corruption), but this is also possible (f.e. by avoiding to dynamically allocate the memory after a segfault).
The problem with "critical signals" (which are delivered to a particular thread, like SIGFPE or SIGSEGV) is that normally the program don't "know" what is the context of the signal - that is, which operation or function have triggered the signal.
There are at least few possible ways to get those informations, for example:
(1) F.e. the famous problem with ESRCH and pthread_kill() issued for a thread which have already exited on its own :)