Why does NASM use 0x89 opcode (137) when it assembles a MOV instruction between two registers?
Here is an example of code assembled using NASM:
55 push ebp
89E5 mov ebp, esp
83EC04 sub esp, byte +0x4
31C0 xor eax, eax
C9 leave
C3 ret
I wanted something like this:
55 push ebp
8BEC mov ebp, esp
83EC04 sub esp, byte +0x4
33C0 xor eax, eax
C9 leave
C3 ret
The reason I wanted 0x8B was: if you view the binary representation of the MOV instruction, it looks like this in NASM:
Opcode Mod Reg R/M
10001001 11 100 101 (89 E5)
The confusing part in this is that the reg operand is the second.
The NASM syntax is this: 0x89 11 source_reg destination_reg
and
the MOV instruction is mov destination_reg, source_reg
The two opcodes are the same. That's x86's redundancy. The assembler can choose whatever it likes
A typical instruction of x86 architecture has two opcodes. The first of them has a register as the first operand and a register or a memory location as the second one (that's abbreviated "reg, reg/mem32"
in the opcode reference or "Gv, Ev"
in the opcode table). The operands for the second opcode are reversed (that's abbreviated "reg/mem32, reg"
or "Ev, Gv"
). This makes sense: the processor must know if it copies to the memory, or from the memory. But when both operands are registers, the encoding becomes redundant:
; mod reg r/m
03C3 add eax, ebx ; 11 000 011
01D8 add eax, ebx ; 11 011 000
There are much more than just reg/reg style like this. See it here
Different assemblers emit different opcodes, so this technique can be used to identify the assembler
Some assemblers allow you to choose the encoding. For example GAS can emit the other encoding if you affix .s
to the end
10 de adcb %bl,%dh
12 f3 adcb.s %bl,%dh
What is the ".s" suffix in x86 instructions?