The x86_64 SysV ABI's function calling convention defines integer argument #4 to be passed in the rcx
register. The Linux kernel syscall ABI, on the other hand, uses r10
for that same purpose. All other arguments are passed in the same registers for both functions and syscalls.
This leads to some strange things. Check out, for example, the implementation of mmap
in glibc for the x32 platform (for which the same discrepancy exists):
00432ce0 <__mmap>:
432ce0: 49 89 ca mov %rcx,%r10
432ce3: b8 09 00 00 40 mov $0x40000009,%eax
432ce8: 0f 05 syscall
So all register are already in place, except we move rcx
to r10
.
I am wondering why not define the syscall ABI to be the same as the function call ABI, considering that they are already so similar.
The syscall
instruction is intended to provide a quicker method of entering Ring-0 in order to carry out a system call. This is meant to be an improvement over the old method, which was to raise a software interrupt (int 0x80
on Linux).
Part of the reason the instruction is faster is because it does not change memory, or even change rsp
to point at a kernel stack. Unlike a software interrupt, where the CPU is forced to allow the OS to resume operation without clobbering anything, for this command the CPU is allowed to assume the software is aware that something is happening here.
In particular, syscall
stores two parts of the user-space state in registers. The RIP
to return to after the call is stored in rcx
, and the flags are stored in R11
(because RFLAGS is masked with a kernel-supplied value before entry to the kernel). This means that both those registers are clobbered by the instruction.
Since they are clobbered, the syscall ABI uses another register instead of rcx
, hence the use of r10
for the 4th argument.
r10
is a natural choice, since in the x86-64 SystemV ABI it's not used for passing function args, and functions don't need to preserve their caller's value of r10
. So a syscall wrapper function can mov %rcx, %r10
without any save/restore. This wouldn't be possible with any other register, for 6-arg syscalls and the SysV ABI's function calling convention.
BTW, the 32-bit system call ABI is also accessible with sysenter
, which requires cooperation between user-space and kernel-space to allow returning to user-space after a sysenter
. (i.e. storing some state in user-space before running sysenter
). This is higher performance than int 0x80
, but awkward. Still, glibc uses it (by jumping to user-space code in the vdso pages that the kernel maps into the address space of every process).
AMD's syscall
is another approach to the same idea as Intel's sysenter
: to make entry/exit from the kernel less expensive by not preserving absolutely everything.
AMD's syscall
clobbers the rcx
register, thus r10
is used instead.