I am interested in a signal handler which can identify the address of the instruction which caused the problem.
I know about siginfo_t
and __builtin_return_address
and neither seem to work:
#include <iostream>
#include <signal.h>
void handler (int, siginfo_t *, void *);
int main ()
{
begin:
std :: cerr << &&begin << " ~ " << &&before << " ~ " << &&after << "\n";
struct sigaction s;
s .sa_flags = SA_SIGINFO;
sigemptyset (& s .sa_mask);
s .sa_sigaction = handler;
sigaction (SIGSEGV, &s, NULL);
int * i = NULL;
before:
*i = 0;
after:
std :: cout << "End.\n";
}
void handler (int, siginfo_t *si, void *)
{
std :: cerr << "si:" << si -> si_addr << "\n";
std :: cerr << "At: " << __builtin_return_address (0) << "\n";
std :: cerr << "At: " << __builtin_return_address (1) << "\n";
std :: cerr << "At: " << __builtin_return_address (2) << "\n";
std :: cerr << "At: " << __builtin_return_address (3) << "\n";
std :: cerr << "At: " << __builtin_return_address (4) << "\n";
std :: cerr << "At: " << __builtin_return_address (5) << "\n";
}
This outputs something like:
0x10978 ~ 0x10a4c ~ 0x10a54
si:0
At: 0xfb945364
At: 0xfb939e64
At: 0x10a40
At: 0x10740
At: 0
At: Segmentation Fault
So siginfo_t
is NULL and __builtin_return_address
is yielding values somewhere in between the named labels.
I was expecting both of these to return the value of &&before
. Am I using these functions correctly?
Tested on Linux 2.6.9-89.0.9.Elsmp and SunOS.
The third argument to a handler installed with SA_SIGINFO
(the one declared as void *
) is a pointer to a ucontext_t
structure. The contents of this structure are architecture- and OS-specific and not part of any standard, but they include the information you need. Here's a version of your program adapted to use it (Linux/x86-64 specific; you will need #ifdef
s for every architecture and OS of interest):
#define _GNU_SOURCE 1
#include <iostream>
#include <iomanip>
#include <signal.h>
#include <ucontext.h>
using std::cout;
static volatile int *causecrash;
static void handler(int, siginfo_t *si, void *ptr)
{
ucontext_t *uc = (ucontext_t *)ptr;
cout << "si:" << si->si_addr << '\n';
cout << "ip:" << std::hex << uc->uc_mcontext.gregs[REG_RIP] << '\n';
}
int main()
{
begin:
cout.setf(std::ios::unitbuf);
cout << &&begin << " ~ " << &&before << " ~ " << &&after << '\n';
struct sigaction s;
s.sa_flags = SA_SIGINFO|SA_RESETHAND;
s.sa_sigaction = handler;
sigemptyset(&s.sa_mask);
sigaction(SIGSEGV, &s, 0);
before:
*causecrash = 0;
after:
cout << "End.\n";
}
By the way, GCC has this nasty habit of moving labels whose address is taken but not used in a control transfer operation (as far as it can tell). Compare:
$ g++ -O0 -W -Wall test.cc && ./a.out
0x400a30 ~ 0x400acd ~ 0x400ada
si:0
ip:400ad4
Segmentation fault
$ g++ -O2 -W -Wall test.cc && ./a.out
0x4009f0 ~ 0x4009f0 ~ 0x4009f0
si:0
ip:400ab4
Segmentation fault
See how all the labels are at the same address in the optimized version? That's going to trip up any attempt to, say, recover from the fault by adjusting the PC. IIRC there is a way to make GCC not do that, but I don't know what it is and wasn't able to find it in the manual.
The siginfo_t
isn't going to work because it contains the memory address which was accessed, not the address of the instruction that did it.
Now, the __builtin_return_address
is interesting. On my machine it returns some nonsense:
0x40089f ~ 0x400935 ~ 0x40093f
si:0
At: 0x7fe22916fc20
At: 0x7fe22915ad8e
I have no idea why. But then I examined the core dump:
(gdb) bt
#0 0x00000000004009ff in handler(int, siginfo*, void*) ()
#1 <signal handler called>
#2 0x0000000000400939 in main ()
As you can see, just like in your case, the offending address is somewhere in between label locations. This is easily explained, though. Just look at the disassembly of main():
(gdb) disas
Dump of assembler code for function main:
...
; the label is here:
0x0000000000400935 <+161>: mov -0x8(%rbp),%rax
=> 0x0000000000400939 <+165>: movl $0x0,(%rax)
0x000000000040093f <+171>: mov $0x400c32,%esi
The labelled statement consists of several instructions. The first one loads the address into the RAX register. It completes successfully because there is nothing wrong with it. It's the second one that accesses the address and breaks. This explains why the address in your trace is a bit different from the address of the label, although the code will probably be different from my example. This all doesn't explain why the __builtin_return_address
gives nonsense in my case, though.