When encode instructioncmpw %ax -5
for x86-64, from Intel-instruction-set-reference-manual, I have two opcodes to choose:
3D iw CMP AX, imm16 I Valid Valid Compare imm16 with AX.
83 /7 ib CMP r/m16, imm8 MI Valid Valid Compare imm8 with r/m16.
So there will be two encoding results:
66 3d fb ff ; this for opcode 3d
66 83 f8 fb ; this for opcode 83
Then which one is better?
I tried some online-disassembler below
- https://defuse.ca/online-x86-assembler.htm#disassembly2
https://onlinedisassembler.com/odaweb/
Both can disassemble to origin instruction. But why 6683fb00
also works and 663dfb
doesn't.
Both encodings are the same length, so that doesn't help us decide.
However, as @Michael Petch commented, the imm16
encoding will cause an LCP stall in the decoders on Intel CPUs. (Because without the 66
operand-size prefix, it would be 3D imm32
, so the operand-size prefix changes the length of the rest of the instruction. This is why it's called a Length-Changing-Prefix stall. AFAIK, you'd get the same stall in 16bit code for using a 32bit immediate.)
The imm8
encoding doesn't cause a problem on any microarchitecture I know of, so favour it. See Agner Fog's microarch.pdf, and other links from the x86 tag wiki.
It can be worth using a longer instruction to avoid an LCP stall. (e.g. if you know the upper 16 bits of the register are zero or sign-extended, using 32bit operand size can avoid the LCP stall.)
Intel SnB-family CPUs have a uop cache, so instructions don't always have to be re-decoded before executing. Still, the uop cache is small, so it's worth it.
Of course, if you're tuning for AMD, then this isn't a factor. I forget if Atom and Silvermont decoders also have LCP stalls.
Re: part2:
663d
is prefix+opcode for cmp ax, imm16
. 663dfb
doesn't "work" because it consumes the first byte of the following instruction. When the decoder see 66 3D
, it grabs the next 2 bytes from the instruction stream as the immediate.