data movement error clarification

2020-08-09 04:48发布

问题:

I'm currently solving problem 3.3 from 3rd edition of Computer System: a programmer's perspective and I'm having a hard time understanding what these errors mean...

movb $0xF, (%ebx) gives an error because ebx can't be used as address register

movl %rax, (%rsp) and movb %si, 8(%rbp) gives error saying that theres a mismatch between instruction suffix and register I.D.

movl %eax, %rdx gives an error saying that destination operand incorrect size

why can't we use ebx as address register? Is it because its 32-bit register? Would the following line work if it was movb $0xF, (%rbx) instead? since rbx is of 64bit register?

for the error regarding mismatch between instruction suffix and register I.D, does this error appear because it should've been movq %rax, (%rsp)and movew %si, 8(%rbp) instead of movl %rax, (%rsp) and movb %si, 8(%rbp)?

and lastly, for the error regarding "destination operand incorrect size", is this because the destination register was 64 bit instead of 32? so if the line of code was movl %eax, %edx instead, the error wouldn't have occurred?

any enlightenment would be appreciated.

this is for x86-64

回答1:

movb $0xF, (%ebx) gives an error because ebx can't be used as address register

It's true that ebx can't be used as an address register (for x86-64), but rbx can. ebx is the lower 32bits of rbx. The whole point of 64bit code is that addresses can be 64bits, so trying to reference memory by using a 32bit register makes little sense.

movl %rax, (%rsp) and movb %si, 8(%rbp) gives error saying that 
theres a mismatch between instruction suffix and register I.D.

Yes, because you are using movl, the 'l' means long, which (in this context) means 32bits. However, rax is a 64bit register. If you want to write 64bits out of rax, you should use movq. If you want to write 32bits, you should use eax.

movl %eax, %rdx gives an error saying that destination operand incorrect size

You are trying to move a 32bit value into a 64bit register. There are instructions to do this conversion for you (see cdq for example), but movl isn't one of them.



回答2:

movb $0xF, (%ebx) assembles just fine (with a 0x67 address-size prefix), and executes correctly if the address in ebx is valid.

It might be a bug (and e.g. lead to a segfault from truncating a pointer), or sub-optimal, but if your book makes any stronger claim than that (like that it won't assemble) then your book contains an error.

The only reason you'd ever use that instead of movb $0xF, (%rbx) is if the upper bytes of %rbx potentially held garbage, e.g. in the x32 ABI (ILP32 in long mode), or if you're a dumb compiler that always uses address-size prefixes when targeting 32-bit-pointer mode even when addresses are known to be safely zero-extended.

32-bit address size is actually useful for the x32 ABI for the more common case where an index register holds high garbage, e.g. movl $0x12345, (%edi, %esi,4).

gcc -mx32 could easily emit a movb $0xF, (%ebx) instruction in real life. (Note that -mx32 (32-bit pointers in long mode) is different from -m32 (i386 ABI))

int ext();          // can't inline
void foo(char *p) { 
    ext();          // clobbers arg-passing registers
    *p = 0xf;       // so gcc needs to save the arg for after the call
}

Compiles with gcc7.3 -mx32 -O3 on the Godbolt compiler explorer into

foo(char*):
    pushq   %rbx              # rbx is gcc's first choice of call-preserved reg.
    movq    %rdi, %rbx        # stupid gcc copies the whole 64 bits when only the low 32 are useful
    call    ext()
    movb    $15, (%ebx)       # $15 = $0xF
    popq    %rbx
    ret

mov $edi, %ebx would have been better; IDK why gcc wants to copy the whole 64-bit register when it's treating pointers as 32-bit values. The x32 ABI unfortunately never really caught on on x86 so I guess nobody's put in the time to get gcc to generate great code for it.

AArch64 also has an ILP32 ABI to save memory / cache-footprint on pointer data, so maybe gcc will get better at 32-bit pointers in 64-bit mode in general (benefiting x86-64 as well) if any work for AArch64 ILP32 improves the common cross-architecture parts of this.


so if the line of code was movl %eax, %edx instead, the error wouldn't have occurred?

Right, that would zero-extend EAX into RDX. If you wanted to sign-extend EAX into RDX, use movslq %eax, %rdx (aka Intel-syntax movsxd)

(Almost) all x86 instructions require all their operands to be the same size. (In terms of operand-size; many instructions have a form with an 8-bit or 32-bit immediate that's sign extended to 64-bit or whatever the instruction's operand-size is. e.g. add $1, %eax will use the 3-byte add imm8, r/m32 form.)

Exceptions include shl %cl, %eax, and movzx/movsx.

In AT&T syntax, the sizes of registers have to match the operand-size suffix, if you use one. If you don't, the registers imply an operand-size. e.g. mov %eax, %edx is the same as movl.

Memory + immediate instructions with no register source or destination need an explicit size: add $1, (%rdx) won't assemble because the operand-size is ambiguous, but add %eax, (%rdx) is an addl (32-bit operand-size).

movew %si, 8(%rbp)

No, movw %si, 8(%rbp) would work though :P But note that if you've made a traditional stack frame with push %rbp / mov %rsp, %rbp on function entry, that store to 8(%rbp) will overwrite the low 16 bits of your return address on the stack.

But there's no requirement in x86-64 code for Windows or Linux that you have %rbp pointing there, or holding a valid pointer at all. It's just a call-preserved register like %rbx that you can use for whatever you want as long as you restore the caller's value before returning.