Assembly - x86 call instruction and memory address

2019-05-07 01:37发布

问题:

I've been reading some assembly code and I've started seeing that call instructions are actually program counter relative.

However, whenever I'm using visual studio or windbg to debug, it always says call 0xFFFFFF ... which to me means it's saying I'm going to jump to that address.

Who is right? Is Visual Studio hiding the complexity of the instruction encoding and just saying oh that's what the program means, that is the debugger know it's a pc-relative instruction, and since it knows the pc, it just goes and does the math for you?

Highly confused.

回答1:

If you're disassembling .o object files that haven't been linked yet, the call address will just be a placeholder to be filled in by the linker.

You can use objdump -drwc -Mintel to show the relocation types + symbol names from a .o (The -r option is the key. Or -R for an already-linked shared library.)


It's more useful to the user to show the actual address of the jump target, rather than disassemble it as jcc eip-1234H or something. Object files have a default load address, so the disassembler has a value for eip at every instruction, and this is usually present in disassembly output.

e.g. in some asm code I wrote (where I use symbol names that made it into the object file, so the loop branch target is actually visible to the disassembler):

objdump -M intel  -d rs-asmbench:
...
00000000004020a0 <.loop>:
  4020a0:       0f b6 c2                movzx  eax,dl
  4020a3:       0f b6 de                movzx  ebx,dh
   ...
  402166:       49 83 c3 10             add    r11,0x10
  40216a:       0f 85 30 ff ff ff       jne    4020a0 <.loop>

0000000000402170 <.last8>:
  402170:       0f b6 c2                movzx  eax,dl

Note that the encoding of the jne instruction is a signed little-endian 32bit displacement, of -0xD0 bytes. (jumps add their displacement to the value of e/rip after the jump. The jump instruction itself is 6 bytes long, so the displacement has to be -0xD0, not just -0xCA.) 0x100 - 0xD0 = 0x30, which is the value of the least-significant byte of the 2's complement displacement.

In your question, you're talking about the call addresses being 0xFFFF..., which makes little sense unless that's just a placeholder, or you thought the non-0xFF bytes in the displacement were part of the opcode.

Before linking, references to external symbols look like this:

objdump -M intel -d main.o
  ...
  a5:   31 f6                   xor    esi,esi
  a7:   e8 00 00 00 00          call   ac <main+0xac>
  ac:   4c 63 e0                movsxd r12,eax
  af:   ba 00 00 00 00          mov    edx,0x0
  b4:   48 89 de                mov    rsi,rbx
  b7:   44 89 f7                mov    edi,r14d
  ba:   e8 00 00 00 00          call   bf <main+0xbf>
  bf:   83 f8 ff                cmp    eax,0xffffffff
  c2:   75 cc                   jne    90 <main+0x90>
  ...

Notice how the call instructions have their relative displacement = 0. So before the linker has slotted in the actual relative value, they encode a call with a target of the instruction right after the call. (i.e. RIP = RIP+0). The call bf is immediately followed by an instruction that starts at 0xbf from the start of the section. The other call has a different target address because it's at a different place in the file. (gcc puts main in its own section: .text.startup).

So, if you want to make sense of what's actually being called, look at a linked executable, or get a disassembler that has looks at the object file symbols to slot in symbolic names for call targets instead of showing them as calls with zero displacement.

Relative jumps to local symbols already get resolved before linking:

objdump -Mintel  -d asm-pinsrw.o:
0000000000000040 <.loop>:
  40:   0f b6 c2                movzx  eax,dl
  43:   0f b6 de                movzx  ebx,dh
  ...
 106:   49 83 c3 10             add    r11,0x10
 10a:   0f 85 30 ff ff ff       jne    40 <.loop>
0000000000000110 <.last8>:
 110:   0f b6 c2                movzx  eax,dl

Note, the exact same instruction encoding on the relative jump to a symbol in the same file, even though the file has no base address, so the disassembler just treats it as zero.

See Intel's reference manual for instruction encoding. Links at https://stackoverflow.com/tags/x86/info. Even in 64bit mode, call only supports 32bit sign-extended relative offsets. 64bit addresses are supported as absolute. (In 32bit mode, 16bit relative addresses are supported, with an operand-size prefix, I guess saving one instruction byte.)