I am in need of such a inline assembly code:
- I have a pair(so, it is balanced) of push/pop operation inside the assembly
- I also have a variable in memory (so, not register) as input
like this:
__asm__ __volatile__ ("push %%eax\n\t"
// ... some operations that use ECX as a temporary
"mov %0, %%ecx\n\t"
// ... some other operation
"pop %%eax"
: : "m"(foo));
// foo is my local variable, that is to say, on stack
When disassembling the compiled code, the compiler give the memory address like 0xc(%esp)
, it is relative to esp
, hence, this fragment of code will not works correctly since I have a push
operation before mov
.
Therefore, how can I tell the compile I do not like the foo
relative to esp
, but any thing like -8(%ebp)
relative to ebp.
P.S. You may suggest that I can put eax
inside the Clobbers, but it is just a sample code. I don't like to show the full reason why I don't accept this solution.
Instead of putting the move into ecx within the assembly code, put the operand in ecx directly:
Modifying ESP inside inline-asm should generally be avoided when you have any memory inputs / outputs, so you don't have to disable optimizations or force the compiler to make a stack-frame with EBP some other way. One major advantage is that you (or the compiler) can then use EBP as an extra free register; potentially a significant speedup if you're already having to spill/reload stuff. If you're writing inline asm, presumably this is a hotspot so it's worth spending the extra code-size to use ESP-relative addressing modes.
In x86-64 code, there's an added obstacle to using push/pop safely, because you can't tell the compiler you want to clobber the red-zone below RSP. (You can compile with
-mno-red-zone
, but there's no way to disable it from the C source.) You can get problems like this where you clobber the compiler's data on the stack. No 32-bit x86 ABI has a red-zone, though, so this only applies to x86-64 System V. (Or non-x86 ISAs with a red-zone.)You only need
-fno-omit-frame-pointer
for that function if you want to do asm-only stuff likepush
as a stack data structure, so there's a variable amount of push. Or maybe if optimizing for code-size.You can always write a whole non-inline function in asm and put it in a separate file, then you have full control. But only do that if your function is large enough to be worth the call/ret overhead, e.g. if it includes a whole loop; don't make the compiler
call
a short non-looping function inside a C inner loop, destroying all the call-clobbered registers and having to make sure globals are in sync.It seems you're using
push
/pop
inside inline asm because you don't have enough registers, and need to save/reload something. You don't need to use push/pop for save/restore. Instead, use dummy output operands with"=m"
constraints to get the compiler to allocate stack space for you, and usemov
to/from those slots. (Of course you're not limited tomov
; it can be a win to use a memory source operand for an ALU instruction if you only need the value once or twice.)This may be slightly worse for code-size, but is usually not worse for performance (and can be better). If that's not good enough, write the whole function (or the whole loop) in asm so you don't have to wrestle with the compiler.
This compiles to the following asm with
gcc7.3 -O3 -m32
on the Godbolt compiler explorer. Note the asm-comment showing what the compiler picked for all the template operands: it picked12(%esp)
for%[spill1]
and%edi
for%[spill2]
(because I used"=&rm"
for that operand, so the compiler saved/restore%edi
outside the asm, and gave it to us for that dummy operand).Hmm, the dummy memory operand to tell the compiler which memory we modify seems to have resulted in dedicating a register to that, I guess because the
p
operand is early-clobber so it can't use the same register. I guess you could risk leaving off the early-clobber if you're confident none of the other inputs will use the same register asp
. (i.e. that they don't have the same value).The direct use of the stack pointer to reference local variables is probably caused by the use of compiler optimizations. I think you could solve the issue in a couple of ways:
-fno-omit-frame-pointer
in GCC);esp
in the Clobbers so the compiler will be aware that its value is being modified (check your compiler for compatibility).