It\'s been a while since I last coded arm assembler and I\'m a little rusty on the details. If I call a C function from arm, I only have to worry about saving r0-r3 and lr, right?
If the C function uses any other registers, is it responsible for saving those on the stack and restoring them? In other words, the compiler would generate code to do this for C functions.
For example if I use r10 in an assembler function, I don\'t have to push its value on the stack, or to memory, and pop/restore it after a C call, do I?
This is for arm-eabi-gcc 4.3.0.
It depends on the ABI for the platform you are compiling for. On Linux, there are two ARM ABIs; the old one and the new one. AFAIK, the new one (EABI) is in fact ARM\'s AAPCS. The complete EABI definitions currently live here on ARM\'s infocenter.
From the AAPCS, §5.1.1:
- r0-r3 are the argument and scratch registers; r0-r1 are also the result registers
- r4-r8 are callee-save registers
- r9 might be a callee-save register or not (on some variants of AAPCS it is a special register)
- r10-r11 are callee-save registers
- r12-r15 are special registers
A callee-save register must be saved by the callee (in opposition to a caller-save register, where the caller saves the register); so, if this is the ABI you are using, you do not have to save r10 before calling another function (the other function is responsible for saving it).
Edit: Which compiler you are using makes no difference; gcc in particular can be configured for several different ABIs, and it can even be changed on the command line. Looking at the prologue/epilogue code it generates is not that useful, since it is tailored for each function and the compiler can use other ways of saving a register (for instance, saving it in the middle of a function).
To add up missing info on NEON registers:
From the AAPCS, §5.1.1 Core registers:
- r0-r3 are the argument and scratch registers; r0-r1 are also the result registers
- r4-r8 are callee-save registers
- r9 might be a callee-save register or not (on some variants of AAPCS it is a special register)
- r10-r11 are callee-save registers
- r12-r15 are special registers
From the AAPCS, §5.1.2.1 VFP register usage conventions:
- s16–s31 (d8–d15, q4–q7) must be preserved
- s0–s15 (d0–d7, q0–q3) and d16–d31 (q8–q15) do not need to be preserved
Original post:
arm-to-c-calling-convention-neon-registers-to-save
For 64-bit ARM, A64 (from Procedure Call Standard for the ARM 64-bit Architecture)
There are thirty-one, 64-bit, general-purpose (integer) registers visible to the A64 instruction set; these are labeled r0-r30. In a 64-bit context these registers are normally referred to using the names x0-x30; in a 32-bit context the registers are specified by using w0-w30. Additionally, a stack-pointer register, SP, can be used with a restricted number of instructions.
- SP The Stack Pointer
- r30 LR The Link Register
- r29 FP The Frame Pointer
- r19…r28 Callee-saved registers
- r18 The Platform Register, if needed; otherwise a temporary register.
- r17 IP1 The second intra-procedure-call temporary register (can be used
by call veneers and PLT code); at other times may be used as a
temporary register.
- r16 IP0 The first intra-procedure-call scratch register (can be used by call
veneers and PLT code); at other times may be used as a
temporary register.
- r9…r15 Temporary registers
- r8 Indirect result location register
- r0…r7 Parameter/result registers
The first eight registers, r0-r7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
Registers r16 (IP0) and r17 (IP1) may be used by a linker as a scratch register between a routine and any subroutine it calls. They can also be used within a routine to hold intermediate values between subroutine calls.
The role of register r18 is platform specific. If a platform ABI has need of a dedicated general purpose register to carry inter-procedural state (for example, the thread context) then it should use this register for that purpose. If the platform ABI has no such requirements, then it should use r18 as an additional temporary register. The platform ABI specification must document the usage for this register.
SIMD
The ARM 64-bit architecture also has a further thirty-two registers, v0-v31, which can be used by SIMD and Floating-Point operations. The precise name of the register will change indicating the size of the access.
Note: Unlike in AArch32, in AArch64 the 128-bit and 64-bit views of a SIMD and Floating-Point register do not overlap multiple registers in a narrower view, so q1, d1 and s1 all refer to the same entry in the register bank.
The first eight registers, v0-v7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64-bits of each value stored in v8-v15 need to be preserved; it is the responsibility of the caller to preserve larger values.
The answers of CesarB and Pavel provided quotes from AAPCS, but open issues remain. Does the callee save r9? What about r12? What about r14? Furthermore, the answers were very general, and not specific to the arm-eabi toolchain as requested. Here\'s a practical approach to find out which register are callee-saved and which are not.
The following C code contain an inline assembly block, that claims to modify registers r0-r12 and r14. The compiler will generate the code to save the registers required by the ABI.
void foo() {
asm volatile ( \"nop\" : : : \"r0\", \"r1\", \"r2\", \"r3\", \"r4\", \"r5\", \"r6\", \"r7\", \"r8\", \"r9\", \"r10\", \"r11\", \"r12\", \"r14\");
}
Use the command line arm-eabi-gcc-4.7 -O2 -S -o - foo.c
and add the switches for your platform (such as -mcpu=arm7tdmi
for example).
The command will print the generated assembly code on STDOUT. It may look something like this:
foo:
stmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
nop
ldmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
bx lr
Note, that the compiler generated code saves and restores r4-r11. The compiler does not save r0-r3, r12. That it restores r14 (alias lr) is purely accidental as I know from experience that the exit code may also load the saved lr into r0 and then do a \"bx r0\" instead of \"bx lr\". Either by adding the -mcpu=arm7tdmi -mno-thumb-interwork
or by using -mcpu=cortex-m4 -mthumb
we obtain slightly different assembly code that looks like this:
foo:
stmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
nop
ldmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
Again, r4-r11 are saved and restored. But r14 (alias lr) is not restored.
To summarize:
- r0-r3 are not callee-saved
- r4-r11 are callee-saved
- r12 (alias ip) is not callee-saved
- r13 (alias sp) is callee-saved
- r14 (alias lr) is not callee-saved
- r15 (alias pc) is the program counter and is set to the value of lr prior to the function call
This holds at least for arm-eabi-gcc\'s default\'s. There are command line switches (in particular the -mabi switch) that may influence the results.
There is also difference at least at Cortex M3 architecture for function call and interrupt.
If an Interrupt occurs it will make automatic push R0-R3,R12,LR,PC onto Stack and when return form IRQ automatic POP. If you use other registers in IRQ routine you have to push/pop them onto Stack manually.
I don\'t think this automatic PUSH and POP is made for a Function call (jump instruction). If convention says R0-R3 can be used only as an argument, result or scratch registers, so there is no need to store them before function call because there shouldn\'t be any value used later after function return. But same as in an interrupt you have to store all other CPU registers if you use them in your function.