I'm learning the basis of reverse engineering. While reversing a crackme it happened to me to see this pattern at the beginning of almost every function:
pushl %ebp
movl %esp, %ebp
pushl %ebx # because ebx is a callee-saved register
subl $0x14,%esp # of course $0x14 changes depending on the function
calll 0x08048766
addl $0x1a5f, %ebx # also this value sometime changes depending on the function
Where at 0x08048766
there is a function that does just this:
movl 0(%esp), %ebx
retl
So basically, as it is normal, every function first initialize the registers ebp
and esp
. Then the register ebx
is pushed into the stack, and this also is totally understandable as ebx
is a callee-saved register and it is used later in the function to reference some static data (from .rodata
), for example:
leal -0x17b7(%ebx), %eax
movl %eax, 0(%esp)
calll printf
Now the most interesting (and to me obscure) part: If I have understood correctly, ebx
is first initialized with the value pointed by esp
(this using the function at 0x08048766
), why? What's inside there? Isn't it an uninitialized point down into the stack?
Then another value is added to ebx
. What does this value represent?
I would like to understand better how the register ebx
is used in this case, and how to calculate the address it is pointing to.
You can have a look to the complete program here, but unfortunately there isn't any C source code available.
This code appears to have been compiled with -fPIC
. PIC stands for "position-independent code", meaning it can be loaded to any address and is still able to access it's global variables.
In this case ebx
is known as the PIC register, and it's used to point to the end of the GOT (the global offset table). The GOT has offsets (from the program's base address*) to each global variable being used.
Many times, the best way to learn about these kinds of things are to compile some code yourself, and look at the output. It especially makes it easier when you have your symbols to look at.
Let's do an experiment:
pic.c
int global;
int main(void)
{
global = 4;
return 0;
}
Compile
$ gcc -v
...
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
$ gcc -m32 -Wall -Werror -fPIC -o pic pic.c
Sections (abbreviated)
$ readelf -S pic
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[13] .text PROGBITS 080482f0 0002f0 000182 00 AX 0 0 16
[15] .rodata PROGBITS 08048488 000488 00000c 00 A 0 0 4
[22] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4
[23] .got.plt PROGBITS 0804a000 001000 000014 04 WA 0 0 4
[24] .data PROGBITS 0804a014 001014 000004 00 WA 0 0 1
[25] .bss NOBITS 0804a018 001018 000008 00 WA 0 0 4
Disassemble (Intel syntax because AT&T drives me nuts)
$ objdump -Mintel -d --no-show-raw-insn pic
080483eb <main>:
80483eb: push ebp
80483ec: mov ebp,esp
80483ee: call 804840b <__x86.get_pc_thunk.ax> ; EAX = EIP + 5
80483f3: add eax,0x1c0d ; EAX = 0x804a000 (.got.plt, end of .got)
80483f8: lea eax,[eax+0x1c] ; EAX = 0x804a01C (.bss + 4)
80483fe: mov DWORD PTR [eax],0x4 ; set `global` to 4
8048404: mov eax,0x0
8048409: pop ebp
804840a: ret
0804840b <__x86.get_pc_thunk.ax>:
804840b: mov eax,DWORD PTR [esp]
804840e: ret
804840f: nop
Explanation
In this case, my GCC decided to use eax
as the PIC register instead of ebx
.
Also, note that the compiler (GCC 5.3.1) did something interesting here. Instead of accessing the variable via the GOT, it essentially used the GOT as an "anchor", and instead offsetted directly to the variable in the .bss
section.
Back to your code:
pushl %ebp
movl %esp, %ebp
pushl %ebx ; because ebx is a callee-saved register
subl $0x14,%esp ; end of typical prologue
calll 0x08048766 ; __i686_get_pc_thunk_bx
; Gets the current value of EIP after this call into EBX.
; There is no other way to do this in x86 without a call
addl $0x1a5f, %ebx ; Add the displacement to the end of the GOT.
; This displacement of course changes depending on
; where the function is.
; EBX now points to the end of the GOT.
leal -0x17b7(%ebx), %eax ; EAX = EBX - 0x17b7
movl %eax, 0(%esp) ; Put EAX on stack (arg 0 to printf)
; EAX should point to some string
calll printf
In your code also, it didn't actually "use" the GOT (otherwise we would see a second memory de-reference); it used it as an anchor to the string, probably in the read-only data section (.rodata
) which also came before the GOT.
If you look at the function at 0x08048766
, you'll see it looks something like this:
mov (%esp),%eax ; Put return address (pushed onto stack by call insn)
; in eax
ret ; Return