可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I understand in x86_64 assembly there is for example the (64 bit) rax register, but it can also be accessed as a 32 bit register, eax, 16 bit, ax, and 8 bit, al. In what situation would I not just use the full 64 bits, and why, what advantage would there be?
As an example, with this simple hello world program:
section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg
section .text
global start
start:
mov rax, 0x2000004 ; System call write = 4
mov rdi, 1 ; Write to standard out = 1
mov rsi, msg ; The address of hello_world string
mov rdx, len ; The size to write
syscall ; Invoke the kernel
mov rax, 0x2000001 ; System call number for exit = 1
mov rdi, 0 ; Exit success = 0
syscall ; Invoke the kernel
rdi and rdx, at least, only need 8 bits and not 64, right? But if I change them to dil and dl, respectively (their lower 8-bit equivalents), the program assembles and links but doesn't output anything.
However, it still works if I use eax, edi and edx, so should I use those rather than the full 64-bits? Why or why not?
回答1:
First and foremost would be when loading a smaller (e.g. 8-bit) value from memory (reading a char, working on a data structure, deserialising a network packet, etc.) into a register.
MOV AL, [0x1234]
versus
MOV RAX, [0x1234]
SHR RAX, 56
# assuming there are actually 8 accessible bytes at 0x1234,
# and they're the right endianness; otherwise you'd need
# AND RAX, 0xFF or similar...
Or, of course, writing said value back to memory.
(Edit, like 6 years later):
Since this keeps coming up:
MOV AL, [0x1234]
- only reads a single byte of memory at 0x1234 (the inverse would only overwrite a single byte of memory)
- keeps whatever was in the other 56 bits of RAX
- This creates a dependency between the past and future values of RAX, so the CPU can't optimise the instruction using register renaming.
By contrast:
MOV RAX, [0x1234]
- reads 8 bytes of memory starting at 0x1234 (the inverse would overwrite 8 bytes of memory)
- overwrites all of RAX
- assumes the bytes in memory have the same endianness as the CPU (often not true in network packets, hence my
SHR
instruction years ago)
Also important to note:
MOV EAX, [0x1234]
- reads 4 bytes of memory starting at 0x1234 (the inverse would overwrite 4 bytes of memory)
- overwrites all of RAX, but the high bits will all be zero
- see: Why do most x64 instructions zero the upper part of a 32 bit register
Then, as mentioned in the comments, there is:
MOVZX EAX, byte [0x1234]
- only reads a single byte of memory at 0x1234
- extends the value to fill all of EAX (and thus RAX) with zeroes (eliminating the dependency and allowing register renaming optimisations).
In all of these cases, if you want to write from the 'A' register into memory you'd have to pick your width:
MOV [0x1234], AL ; write a byte (8 bits)
MOV [0x1234], AX ; write a word (16 bits)
MOV [0x1234], EAX ; write a dword (32 bits)
MOV [0x1234], RAX ; write a qword (64 bits)
回答2:
You are asking several questions here.
If you just load the low 8 bits of a register, the rest of the register will keep its previous value. That can explain why your system call got the wrong parameters.
One reason for using 32 bits when that is all you need is that many instructions using EAX or EBX are one byte shorter than those using RAX or RBX. It might also mean that constants loaded into the register are shorter.
The instruction set has evolved over a long time and has quite a few quirks!
回答3:
If you just need 32-bit registers, you can safely work with them, this is OK under 64-bit. But if you just need 16-bit or 8-bit registers, try to avoid them or always use movzx/movsx to clear the remaining bits. It is well known that under x86-64, using 32-bit operands clears the higher bits of the 64-bit register. The main purpose of this is avoid false dependency chains.
Please refer to the relevant section - 3.4.1.1 - of The Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1:
32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register
Breaking dependency chains allows the instructions to execute in parallel, in random order, by the Out-of-Order algorithm implemented internally by CPUs since Pentium Pro in 1995.
A Quote from the Intel® 64 and IA-32 Architectures Optimization Reference Manual, Section 3.5.1.8:
Code sequences that modifies partial register can experience some delay in its dependency chain, but can be avoided by using dependency breaking idioms. In processors based on Intel Core micro-architecture, a number of instructions can help clear execution dependency when software uses these instruction to clear register content to zero. Break dependencies on portions of registers between instructions by operating on 32-bit registers instead of partial registers. For moves, this can be accomplished with 32-bit moves or by using MOVZX.
Assembly/Compiler Coding Rule 37. (M impact, MH generality): Break dependencies on portions of registers between instructions by operating on 32-bit registers instead of partial registers. For moves, this can be accomplished with 32-bit moves or by using MOVZX.
The MOVZX and MOV with 32-bit operands for x64 are equivalent - they all break dependency chains.
That's why your code will execute faster if you always try clear the highest bits of larger registers when using smaller registers. When the bits are always cleard, thre are no dependencies on the previous value of the register, the CPU can internally rename the registers.
Register renaming is a technique used internally by a CPU that eliminates the false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them.
回答4:
If you want to work with only an 8-bit quantity, then you'd work with the AL register. Same for AX and EAX.
For example, you could have a 64-bit value that contains two 32-bit values. You can work on the low 32-bits by accessing the EAX register. When you want to work on the high 32-bits, you can swap the two 32-bit quantities (reverse the DWORDs in the register) so that the high bits are now in EAX.
回答5:
64-bit
is the largest piece of memory you can work with as a single unit. That doesn't mean that's how much you need to use.
If you need 8 bits, use 8. If you need 16, use 16. If it doesn't matter how many bits, then it doesn't matter how many you use.
Admittedly, when on a 64-bit processor, there's very little overhead to use the full 64 bits. But if, for example, you are calculating a byte value, working with a byte will mean the result will already be the correct size.