Why does nasm use 0x89 when it assembles a MOV ins

2019-08-06 08:30发布

问题:

Why does NASM use 0x89 opcode (137) when it assembles a MOV instruction between two registers?

Here is an example of code assembled using NASM:

55      push ebp
89E5    mov ebp, esp
83EC04  sub esp, byte +0x4
31C0    xor eax, eax
C9      leave
C3      ret

I wanted something like this:

55      push ebp
8BEC    mov ebp, esp
83EC04  sub esp, byte +0x4
33C0    xor eax, eax
C9      leave
C3      ret

The reason I wanted 0x8B was: if you view the binary representation of the MOV instruction, it looks like this in NASM:

Opcode   Mod Reg R/M
10001001 11  100 101 (89 E5)

The confusing part in this is that the reg operand is the second.

The NASM syntax is this: 0x89 11 source_reg destination_reg and
the MOV instruction is mov destination_reg, source_reg

回答1:

The two opcodes are the same. That's x86's redundancy. The assembler can choose whatever it likes

A typical instruction of x86 architecture has two opcodes. The first of them has a register as the first operand and a register or a memory location as the second one (that's abbreviated "reg, reg/mem32" in the opcode reference or "Gv, Ev" in the opcode table). The operands for the second opcode are reversed (that's abbreviated "reg/mem32, reg" or "Ev, Gv"). This makes sense: the processor must know if it copies to the memory, or from the memory. But when both operands are registers, the encoding becomes redundant:

                  ; mod reg r/m
03C3 add eax, ebx ;  11 000 011
01D8 add eax, ebx ;  11 011 000

There are much more than just reg/reg style like this. See it here

Different assemblers emit different opcodes, so this technique can be used to identify the assembler

Some assemblers allow you to choose the encoding. For example GAS can emit the other encoding if you affix .s to the end

10 de   adcb   %bl,%dh
12 f3   adcb.s %bl,%dh

What is the ".s" suffix in x86 instructions?



标签: assembly nasm