As a small recall, the x86 architecture defines 0x0F 0x1F [mod R/M]
as a multi-byte NOP.
Now I'm looking at the specific case of an 8-byte NOP: I have got
0x0F 0x1F 0x84 0x__ 0x__ 0x__ 0x__ 0x__
where the last 5 bytes have got arbitrary values.
The third byte, [mod R/M]
, split up gives:
mod = 10b
: argument isreg1
+ a DWORD-sized displacementreg2 = 000b
: (we don't care)reg1 = 100b
: indicates that the argument is instead theSIB
byte + a DWORD-sized displacement.
Now, as a concrete example, if I take
0x0F 0x1F 0x84 0x12 0x34 0x56 0x78 0x9A
I've got
SIB = 0x12
displacement = 0x9A785634
: a DWORD
Now I add the 0x66
instruction prefix to indicate that the displacement should be a WORD instead of a DWORD:
0x66 0x0F 0x1F 0x84 0x12 0x34 0x56 0x78 0x9A
I expect 0x78 0x9A
to be 'cut off' and be treated as a new instruction. However, when compiling this and running objdump
on the resulting executable, it still uses all 4 bytes (a DWORD) as displacement.
Am I misunderstanding the meaning of 'displacement' in this context? Or does the 0x66
prefix not have any effect on multi-byte NOP instructions?
The
66H
prefix overrides the size of the operand to 16 bit.It does not override the size of the address, if you want that you use
67H
Here's a list of all operands.
However it is best not to create your own nop instructions, but stick to the recommended (multi-byte) nops.
According to AMD the recommended multibytes nops are as follows:
Table 4-9. Recommended Multi-Byte Sequence of NOP Instruction
Intel does not mind up to 3 redundant prefixes, so nop's up to 11 bytes can be constructed like so.
Of course you can also eliminate nops by prefixing normal instructions with redundant prefixes.
e.g.
or forcing the cpu to use longer versions of the same instruction.
The instructions with immediate operands have short and long versions.
Most assembler will helpfully shorten all instructions for you, so you'll have to code the longer instructions yourself using
db
Interspersing these in strategic locations can help you align jump targets without having to incur delays due to the decoding or execution of a nop.
Remember on most CPU's executing nop's still uses up resources.