gdb: set a breakpoint for a SIGBUS handler

2019-03-19 16:01发布

问题:

I'm trying to debug a simple stop-and-copy garbage collector (written in C) using GDB. The GC works by handling SIGBUS. I've set a breakpoint at the top of my SIGBUS signal handler. I've told GDB to pass SIGBUS to my program. However, it doesn't appear to work.

The following program (explained inline) shows the essence of my problem:

#include <stdio.h>
#include <sys/mman.h>
#include <assert.h>
#include <signal.h>

#define HEAP_SIZE 4096

unsigned long int *heap;

void gc(int n) {
  signal(SIGBUS, SIG_DFL); // just for debugging
  printf("GC TIME\n");
}

int main () {

  // Allocate twice the required heap size (two semi-spaces)
  heap = mmap(NULL, HEAP_SIZE * 2, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED,
              -1, 0);
  assert (heap != MAP_FAILED);
  // 2nd semi-space is unreadable. Using "bump-pointer allocation", a SIGBUS
  // tells us we are out of space and need to GC.
  void *guard = mmap(heap + HEAP_SIZE, HEAP_SIZE, PROT_NONE, MAP_ANON |
                     MAP_SHARED | MAP_FIXED, -1, 0);
  assert (guard != MAP_FAILED);
  signal(SIGBUS, gc);
  heap[HEAP_SIZE] = 90; // pretend we are out of heap space
  return 0;
} 

I compile and run the program on Mac OS X 10.6 and get the output I expect:

$ gcc debug.c
$ ./a.out
GC TIME
Bus error

I want to run and debug this program using GDB. In particular, I want to set a breakpoint at the gc function (really, the gc signal handler). Naturally, I need to tell GDB to not halt on SIGBUS as well:

$ gdb ./a.out 
GNU gdb 6.3.50-20050815 (Apple version gdb-1346) (Fri Sep 18 20:40:51 UTC 2009)
... snip ...
(gdb) handle SIGSEGV SIGBUS nostop noprint
Signal        Stop  Print   Pass to program Description
SIGBUS        No    No  Yes     Bus error
SIGSEGV       No    No  Yes     Segmentation fault
(gdb) break gc
Breakpoint 1 at 0x100000d6f

However, we never reach the breakpoint:

(gdb) run
Starting program: /snip/a.out 
Reading symbols for shared libraries +. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000100029000
0x0000000100000e83 in main ()
(gdb) 

Apparently, the signal handler is not invoked (GC TIME is not printed). Furthermore, we're still in main(), at the faulting mov:

0x0000000100000e83 <main+247>:  movq   $0x5a,(%rax)

Any ideas?

Thanks.

回答1:

The same code (modified to handle SIGSEGV too) works as expected in GDB on Linux; it may be a bug in OS X or GDB's port to that platform.

Googling finds broken OS X behavior just like yours all the way back on 10.1, with a sort-of workaround (set inferior-bind-exception-port off before running the program).

(There's a similar bug on Windows.)



回答2:

Internally, bad memory accesses result in the Mach exception EXC_BAD_ACCESS being sent to the program. Normally, this is translated into a SIGBUS UNIX signal. However, gdb intercepts Mach exceptions directly, before the signal translation. The solution is to give gdb the command set dont-handle-bad-access 1 before running your program. Then the normal mechanism is used, and breakpoints inside your signal handler are honored.



回答3:

How about putting a for( ;; ); after the printf(), running the program normally, and then connecting to the process with gdb after GC TIME prints out?



回答4:

Why do you expect to get a SIGBUS? SIGBUS usually means alignment error, on an architecture where some datatypes have alignment requirements. It looks like you are just trying to access memory outside your allocated area, and I would expect that you get SIGSEGV instead of SIGBUS then.

Edit:

It seems that the Mac OS X name for what I think of as SIGSEGV is SIGBUS. Therefore, ignore this answer.

If it's any help, the breakpoint works as expected (i.e., it works) when I try the program on a Linux system, with SIGBUS replaced by SIGSEGV.

Edit 2:

Can you try to catch SIGSEGV in your program too? It seems that the type of signal may vary depending on where the memory is mapped in Mac OS X (I just read the discussion here and here), and just perhaps a different signal could be thrown when you run in the debugger?