Getting fault address that generated a UNIX signal

2019-01-27 20:18发布

问题:

I am interested in a signal handler which can identify the address of the instruction which caused the problem.

I know about siginfo_t and __builtin_return_address and neither seem to work:

#include <iostream>
#include <signal.h>

void handler (int, siginfo_t *, void *);

int main ()
{
begin:
    std :: cerr << &&begin << " ~ " << &&before << " ~ " << &&after << "\n";

    struct sigaction s;
    s .sa_flags = SA_SIGINFO;
    sigemptyset (& s .sa_mask);
    s .sa_sigaction = handler;
    sigaction (SIGSEGV, &s, NULL);

    int * i = NULL;
before:
    *i = 0;
after:
    std :: cout << "End.\n";
}

void handler (int, siginfo_t *si, void *)
{
    std :: cerr << "si:" << si -> si_addr << "\n";
    std :: cerr << "At: " << __builtin_return_address (0) << "\n";
    std :: cerr << "At: " << __builtin_return_address (1) << "\n";
    std :: cerr << "At: " << __builtin_return_address (2) << "\n";
    std :: cerr << "At: " << __builtin_return_address (3) << "\n";
    std :: cerr << "At: " << __builtin_return_address (4) << "\n";
    std :: cerr << "At: " << __builtin_return_address (5) << "\n";
}

This outputs something like:

0x10978 ~ 0x10a4c ~ 0x10a54
si:0
At: 0xfb945364
At: 0xfb939e64
At: 0x10a40
At: 0x10740
At: 0
At: Segmentation Fault

So siginfo_t is NULL and __builtin_return_address is yielding values somewhere in between the named labels.

I was expecting both of these to return the value of &&before. Am I using these functions correctly?

Tested on Linux 2.6.9-89.0.9.Elsmp and SunOS.

回答1:

The third argument to a handler installed with SA_SIGINFO (the one declared as void *) is a pointer to a ucontext_t structure. The contents of this structure are architecture- and OS-specific and not part of any standard, but they include the information you need. Here's a version of your program adapted to use it (Linux/x86-64 specific; you will need #ifdefs for every architecture and OS of interest):

#define _GNU_SOURCE 1
#include <iostream>
#include <iomanip>
#include <signal.h>
#include <ucontext.h>

using std::cout;

static volatile int *causecrash;

static void handler(int, siginfo_t *si, void *ptr)
{
   ucontext_t *uc = (ucontext_t *)ptr;

   cout << "si:" << si->si_addr << '\n';
   cout << "ip:" << std::hex << uc->uc_mcontext.gregs[REG_RIP] << '\n';
}

int main()
{
begin:
    cout.setf(std::ios::unitbuf);
    cout << &&begin << " ~ " << &&before << " ~ " << &&after << '\n';

    struct sigaction s;
    s.sa_flags = SA_SIGINFO|SA_RESETHAND;
    s.sa_sigaction = handler;
    sigemptyset(&s.sa_mask);
    sigaction(SIGSEGV, &s, 0);

before:
    *causecrash = 0;
after:
    cout << "End.\n";
}

By the way, GCC has this nasty habit of moving labels whose address is taken but not used in a control transfer operation (as far as it can tell). Compare:

$ g++ -O0 -W -Wall test.cc && ./a.out 
0x400a30 ~ 0x400acd ~ 0x400ada
si:0
ip:400ad4
Segmentation fault
$ g++ -O2 -W -Wall test.cc && ./a.out 
0x4009f0 ~ 0x4009f0 ~ 0x4009f0
si:0
ip:400ab4
Segmentation fault

See how all the labels are at the same address in the optimized version? That's going to trip up any attempt to, say, recover from the fault by adjusting the PC. IIRC there is a way to make GCC not do that, but I don't know what it is and wasn't able to find it in the manual.



回答2:

The siginfo_t isn't going to work because it contains the memory address which was accessed, not the address of the instruction that did it.

Now, the __builtin_return_address is interesting. On my machine it returns some nonsense:

0x40089f ~ 0x400935 ~ 0x40093f
si:0
At: 0x7fe22916fc20
At: 0x7fe22915ad8e

I have no idea why. But then I examined the core dump:

(gdb) bt
#0  0x00000000004009ff in handler(int, siginfo*, void*) ()
#1  <signal handler called>
#2  0x0000000000400939 in main ()

As you can see, just like in your case, the offending address is somewhere in between label locations. This is easily explained, though. Just look at the disassembly of main():

(gdb) disas
Dump of assembler code for function main:
   ...
   ; the label is here:
   0x0000000000400935 <+161>:   mov    -0x8(%rbp),%rax
=> 0x0000000000400939 <+165>:   movl   $0x0,(%rax)
   0x000000000040093f <+171>:   mov    $0x400c32,%esi

The labelled statement consists of several instructions. The first one loads the address into the RAX register. It completes successfully because there is nothing wrong with it. It's the second one that accesses the address and breaks. This explains why the address in your trace is a bit different from the address of the label, although the code will probably be different from my example. This all doesn't explain why the __builtin_return_address gives nonsense in my case, though.



标签: c++ unix signals