Trying to single step through program with trap-fl

2019-04-11 23:44发布

问题:

I'd like to create a complete instruction trace of the execution of a program, to collect some stats etc. I first tried using linux' ptrace functionality to step through a program (using the tutorial here). This creates two processes, the traced one and the debugger, and they communicate via signals. I only got around 16K instructions per second (on 1.6GHz Atom), so this is too slow for anything non-trivial.

I thought the interprocess communication via signals is too slow, so I tried setting up the debugging in the same process as the execution: Set the trap flag, and create a signal handler. When a software interrupt is used to make a syscall, the trap flag should be saved, the kernel would use it's own flags - so I thought. But my program somehow gets killed by signal SIGTRAP.

This is what I set up:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>

int cycle = 0;
void trapHandler(int signum) {
  if (cycle % 262144 == 0) {
    write(STDOUT_FILENO," trap\n",6);
  }
  cycle += 1;
}

void startTrace() {
  // set up signal handler                                                                                                         
  signal(SIGTRAP, trapHandler);

  // set trap flag                                                                                                                 
  asm volatile("pushfl\n"
               "orl $0x100, (%esp)\n"
               "popfl\n"
               );
}

void printRock() {
  char* s = "Rock\n";
  asm(
      "movl $5, %%edx\n" // message length                                                                                         
      "movl %0, %%ecx\n" // message to write                                                                                       
      "movl $1, %%ebx\n" // file descriptor (stdout)                                                                               
      "movl $4, %%eax\n" // system call number (sys_write)                                                                         
      "int  $0x80\n"   // sycall                                                                                                   
      : // no output regs                                                                                                          
      : "r"(s) // input text                                                                                                       
      : "edx","ecx","ebx","eax"
      );
}

int main() {
  startTrace();

  // some computation                                                                                                              
  int x = 0;
  int i;
  for (i = 0; i < 100000; i++) {
    x += i*2;
  }

  printRock();
  write(STDOUT_FILENO,"Paper\n",6);
  write(STDOUT_FILENO,"Scissors\n",9);
}

When running, this gives:

 trap
 trap
 trap
Rock
Paper
 trap
Trace/breakpoint trap (core dumped)

So now we get about 250K instructions per second, still slow but non-trivial executions are possible. But there is that core dump that appears to happen between the two write calls. In GDB, we see where it happens:

Dump of assembler code for function __kernel_vsyscall:
   0xb76f3414 <+0>:  push   %ecx
   0xb76f3415 <+1>:  push   %edx
   0xb76f3416 <+2>:  push   %ebp
   0xb76f3417 <+3>:  mov    %esp,%ebp
   0xb76f3419 <+5>:  sysenter 
   0xb76f341b <+7>:  nop
   0xb76f341c <+8>:  nop
   0xb76f341d <+9>:  nop
   0xb76f341e <+10>: nop
   0xb76f341f <+11>: nop
   0xb76f3420 <+12>: nop
   0xb76f3421 <+13>: nop
   0xb76f3422 <+14>: int    $0x80
=> 0xb76f3424 <+16>: pop    %ebp
   0xb76f3425 <+17>: pop    %edx
   0xb76f3426 <+18>: pop    %ecx
   0xb76f3427 <+19>: ret 

And the backtrace:

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0  0xb77c5424 in __kernel_vsyscall ()
#1  0xb76d0553 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
#2  0x0804847d in trapHandler (signum=5) at count.c:8
#3  <signal handler called>
#4  0xb77c5424 in __kernel_vsyscall ()
#5  0xb76d0553 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
#6  0x08048537 in main () at count.c:49

It appears syscalls that happen via int 80 are fine, but the write calls use the kernel's VIDSO/vsyscall break somehow (I didn't know about this functionality, closer described here). It may be related to using sysenter rather than int 80, maybe the trap flag survives when stepping into the kernel. I don't quite get what's going with the recursive __kernel_vsyscall calls. I also don't get why there's an int 80 call inside the __kernel_vsyscall function.

Does anybody have a suggestion what's going on, and how to fix this? Maybe it's possible to disable the VDSO/vsysicall? Or is it possible to override the __kernel_vsyscall function with one that uses int 80 rather than sysenter?

回答1:

Answering own question. I didn't figure out what was happening or explain it in detail, but I found a workaround: disable VDSO. That can be done via

sudo sysctl vm.vdso_enabled=0

With this, this whole single stepping through a program works, including stepping across system calls. Disclaimer: don't blame me if things go bad.

EDIT: After updating my Linux (32-bit x86) much later, this error doesn't occur anymore. Maybe it was a bug that was fixed.