I understand in x86_64 assembly there is for example the (64 bit) rax register, but it can also be accessed as a 32 bit register, eax, 16 bit, ax, and 8 bit, al. In what situation would I not just use the full 64 bits, and why, what advantage would there be?
As an example, with this simple hello world program:
section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg
section .text
global start
start:
mov rax, 0x2000004 ; System call write = 4
mov rdi, 1 ; Write to standard out = 1
mov rsi, msg ; The address of hello_world string
mov rdx, len ; The size to write
syscall ; Invoke the kernel
mov rax, 0x2000001 ; System call number for exit = 1
mov rdi, 0 ; Exit success = 0
syscall ; Invoke the kernel
rdi and rdx, at least, only need 8 bits and not 64, right? But if I change them to dil and dl, respectively (their lower 8-bit equivalents), the program assembles and links but doesn't output anything.
However, it still works if I use eax, edi and edx, so should I use those rather than the full 64-bits? Why or why not?
64-bit
is the largest piece of memory you can work with as a single unit. That doesn't mean that's how much you need to use.If you need 8 bits, use 8. If you need 16, use 16. If it doesn't matter how many bits, then it doesn't matter how many you use.
Admittedly, when on a 64-bit processor, there's very little overhead to use the full 64 bits. But if, for example, you are calculating a byte value, working with a byte will mean the result will already be the correct size.
If you just need 32-bit registers, you can safely work with them, this is OK under 64-bit. But if you just need 16-bit or 8-bit registers, try to avoid them or always use movzx/movsx to clear the remaining bits. It is well known that under x86-64, using 32-bit operands clears the higher bits of the 64-bit register. The main purpose of this is avoid false dependency chains.
Please refer to the relevant section - 3.4.1.1 - of The Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1:
Breaking dependency chains allows the instructions to execute in parallel, in random order, by the Out-of-Order algorithm implemented internally by CPUs since Pentium Pro in 1995.
A Quote from the Intel® 64 and IA-32 Architectures Optimization Reference Manual, Section 3.5.1.8:
The MOVZX and MOV with 32-bit operands for x64 are equivalent - they all break dependency chains.
That's why your code will execute faster if you always try clear the highest bits of larger registers when using smaller registers. When the bits are always cleard, thre are no dependencies on the previous value of the register, the CPU can internally rename the registers.
Register renaming is a technique used internally by a CPU that eliminates the false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them.
If you want to work with only an 8-bit quantity, then you'd work with the AL register. Same for AX and EAX.
For example, you could have a 64-bit value that contains two 32-bit values. You can work on the low 32-bits by accessing the EAX register. When you want to work on the high 32-bits, you can swap the two 32-bit quantities (reverse the DWORDs in the register) so that the high bits are now in EAX.
First and foremost would be when loading a smaller (e.g. 8-bit) value from memory (reading a char, working on a data structure, deserialising a network packet, etc.) into a register.
versus
Or, of course, writing said value back to memory.
(Edit, like 6 years later):
Since this keeps coming up:
By contrast:
SHR
instruction years ago)Also important to note:
Then, as mentioned in the comments, there is:
In all of these cases, if you want to write from the 'A' register into memory you'd have to pick your width:
You are asking several questions here.
If you just load the low 8 bits of a register, the rest of the register will keep its previous value. That can explain why your system call got the wrong parameters.
One reason for using 32 bits when that is all you need is that many instructions using EAX or EBX are one byte shorter than those using RAX or RBX. It might also mean that constants loaded into the register are shorter.
The instruction set has evolved over a long time and has quite a few quirks!