In the x86-64 Tour of Intel Manuals, I read
Perhaps the most surprising fact is that an instruction such as
MOV EAX, EBX
automatically zeroes upper 32 bits ofRAX
register.
The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture) quoted at the same source tells us:
- 64-bit operands generate a 64-bit result in the destination general-purpose register.
- 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
- 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.
In x86-32 and x86-64 assembly, 16 bit instructions such as
mov ax, bx
don't show this kind of "strange" behaviour that the upper word of eax is zeroed.
Thus: what is the reason why this behaviour was introduced? At a first glance it seems illogical (but the reason might be that I am used to the quirks of x86-32 assembly).
It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.
It also saves you from having to encode 8 byte values for
MOV RAX, 42
, whenMOV EAX, 42
can be reused.This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.
I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the cpu would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way. This way you can write fast 32bit code in 64bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32bit instruction in 64bit mode would have to wait on something that happened before, even though that high part would almost never be used.
The behaviour for 16bit instructions is the strange one. The dependency madness is one of the reasons that 16bit instructions are avoided now.