I'm trying to use ptrace to trace all syscalls made by a separate process, be it 32-bit (IA-32) or 64-bit (x86-64). My tracer would run on a 64-bit x86 installation with IA-32 emulation enabled, but ideally would be able to trace both 64-bit and 32-bit applications, including if a 64-bit application forks and execs a 32-bit process.
The issue is that, since 32-bit and 64-bit syscall numbers differ, I need to know whether a process is 32-bit or 64-bit to determine which syscall it used, even if I have the syscall number. There seem to be imperfect methods, like checking /proc/<pid>/exec
or (as strace does) the size of the registers struct, but nothing reliable.
Complicating this is the fact that 64-bit processes can switch out of long mode to execute 32-bit code directly. They can also make 32-bit int $0x80
syscalls, which, of course, use the 32-bit syscall numbers. I don't "trust" the processes I trace to not use these tricks, so I want to detect them correctly. And I've independently verified that in at least the latter case, ptrace sees the 32-bit syscall numbers and argument register assignments, not the 64-bit ones.
I poked around in the kernel source and came across the TS_COMPAT
flag in arch/x86/include/asm/processor.h
, which appears to be set whenever a 32-bit syscall is made by a 64-bit process. The only problem is that I have no idea how to access this flag from userland, or if it is even possible.
I also thought about reading the %cs
and comparing it to $0x23
or $0x33
, inspired by this method for switching bitness in a running process. But this only detects 32-bit processes, not necessarily 32-bit syscalls (those made with int $0x80
) from a 64-bit process. It's also fragile since it relies on undocumented kernel behavior.
Finally, I noticed that the x86 architecture has a bit for long mode in the Extended Feature Enable Register MSR. But ptrace has no way of reading the MSR from a tracee, and I feel like reading it from within my tracer will be inadequate because my tracer is always running in long mode.
I'm at a loss. Perhaps I could try and use one of those hacks—at this point I'm leaning towards %cs
or the /proc/<pid>/exec
method—but I want something durable that will actually distinguish between 32-bit and 64-bit syscalls. How can a process using ptrace under x86-64, which has detected that its tracee made a syscall, reliably determine whether that syscall was made with the 32-bit (int $0x80
) or 64-bit (syscall
) ABI? Is there some other way for a user process to gain this information about another process that it is authorized to ptrace?
Interesting, I hadn't realized that there wasn't an obvious smarter way that
strace
could use to correctly decodeint 0x80
from 64-bit processes. (This is being worked on, see this answer for links to a proposed kernel patch to addPTRACE_GET_SYSCALL_INFO
to the ptrace API.strace
4.26 already supports it on patched kernels.)As a workaround, I think you could disassemble the code at RIP and check whether it was the
syscall
instruction (0F 05
) or not, becauseptrace
does let you read the target process's memory.But for a security use-case like disallowing some system calls, this would be vulnerable to a race condition: another thread in the syscall process could rewrite the
syscall
bytes toint 0x80
after they execute, but before you can peek at them withptrace
.You only need to do that if the process is running in 64-bit mode, otherwise only the 32-bit ABI is available. If it's not, you don't need to check. (The vdso page can potentially use 32-bit mode
syscall
on AMD CPUs that support it but notsysenter
. Not checking in the first place for 32-bit processes avoids this corner case.) I think you're saying you have a reliable way to detect that at least.(I haven't used the ptrace API directly, just the tools like
strace
that use it. So I hope this answer makes sense.)