What is this pattern where the EBX register is use

2019-06-28 08:18发布

问题:

I'm learning the basis of reverse engineering. While reversing a crackme it happened to me to see this pattern at the beginning of almost every function:

pushl %ebp                            
movl  %esp, %ebp              
pushl %ebx              # because ebx is a callee-saved register
subl  $0x14,%esp        # of course $0x14 changes depending on the function
calll 0x08048766
addl  $0x1a5f, %ebx     # also this value sometime changes depending on the function

Where at 0x08048766 there is a function that does just this:

movl 0(%esp), %ebx         
retl 

So basically, as it is normal, every function first initialize the registers ebp and esp. Then the register ebx is pushed into the stack, and this also is totally understandable as ebx is a callee-saved register and it is used later in the function to reference some static data (from .rodata), for example:

leal  -0x17b7(%ebx), %eax
movl  %eax, 0(%esp) 
calll printf   

Now the most interesting (and to me obscure) part: If I have understood correctly, ebx is first initialized with the value pointed by esp (this using the function at 0x08048766), why? What's inside there? Isn't it an uninitialized point down into the stack?

Then another value is added to ebx. What does this value represent?

I would like to understand better how the register ebx is used in this case, and how to calculate the address it is pointing to.

You can have a look to the complete program here, but unfortunately there isn't any C source code available.

回答1:

This code appears to have been compiled with -fPIC. PIC stands for "position-independent code", meaning it can be loaded to any address and is still able to access it's global variables.

In this case ebx is known as the PIC register, and it's used to point to the end of the GOT (the global offset table). The GOT has offsets (from the program's base address*) to each global variable being used.

Many times, the best way to learn about these kinds of things are to compile some code yourself, and look at the output. It especially makes it easier when you have your symbols to look at.

Let's do an experiment:

pic.c

int global;

int main(void)
{
    global = 4;
    return 0;
}

Compile

$ gcc -v
...
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)

$ gcc -m32 -Wall -Werror -fPIC -o pic pic.c

Sections (abbreviated)

$ readelf -S pic
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [13] .text             PROGBITS        080482f0 0002f0 000182 00  AX  0   0 16
  [15] .rodata           PROGBITS        08048488 000488 00000c 00   A  0   0  4
  [22] .got              PROGBITS        08049ffc 000ffc 000004 04  WA  0   0  4
  [23] .got.plt          PROGBITS        0804a000 001000 000014 04  WA  0   0  4
  [24] .data             PROGBITS        0804a014 001014 000004 00  WA  0   0  1
  [25] .bss              NOBITS          0804a018 001018 000008 00  WA  0   0  4

Disassemble (Intel syntax because AT&T drives me nuts)

$ objdump -Mintel -d --no-show-raw-insn pic

080483eb <main>:
 80483eb:   push   ebp
 80483ec:   mov    ebp,esp
 80483ee:   call   804840b <__x86.get_pc_thunk.ax> ; EAX = EIP + 5
 80483f3:   add    eax,0x1c0d            ; EAX = 0x804a000 (.got.plt, end of .got)
 80483f8:   lea    eax,[eax+0x1c]        ; EAX = 0x804a01C (.bss + 4)

 80483fe:   mov    DWORD PTR [eax],0x4   ; set `global` to 4
 8048404:   mov    eax,0x0
 8048409:   pop    ebp
 804840a:   ret    

0804840b <__x86.get_pc_thunk.ax>:
 804840b:   mov    eax,DWORD PTR [esp]
 804840e:   ret    
 804840f:   nop

Explanation

In this case, my GCC decided to use eax as the PIC register instead of ebx.

Also, note that the compiler (GCC 5.3.1) did something interesting here. Instead of accessing the variable via the GOT, it essentially used the GOT as an "anchor", and instead offsetted directly to the variable in the .bss section.


Back to your code:

pushl %ebp                            
movl %esp, %ebp              
pushl %ebx             ; because ebx is a callee-saved register
subl $0x14,%esp        ; end of typical prologue 

calll 0x08048766       ; __i686_get_pc_thunk_bx
                       ; Gets the current value of EIP after this call into EBX.
                       ; There is no other way to do this in x86 without a call

addl $0x1a5f, %ebx     ; Add the displacement to the end of the GOT.
                       ; This displacement of course changes depending on 
                       ; where the function is.
                       ; EBX now points to the end of the GOT.

leal -0x17b7(%ebx), %eax    ; EAX = EBX - 0x17b7
movl %eax, 0(%esp)          ; Put EAX on stack (arg 0 to printf)
                            ; EAX should point to some string
calll printf   

In your code also, it didn't actually "use" the GOT (otherwise we would see a second memory de-reference); it used it as an anchor to the string, probably in the read-only data section (.rodata) which also came before the GOT.

If you look at the function at 0x08048766, you'll see it looks something like this:

mov    (%esp),%eax  ; Put return address (pushed onto stack by call insn)
                    ; in eax
ret                 ; Return