Writing a C function from given x86 assembly

2019-06-24 10:16发布

I'm trying to reverse engineer this mystery function. This function returns an integer and takes a struct node as an argument

#include "mystery.h"
int mystery(struct e4_struct *s){}

The header file is a simple struct declaration

struct my_struct {
    int a;
    int b; 
};

The assembly I'm trying to reverse engineer is

400596:    8b 07                    mov    (%rdi),%eax
400598:    8d 04 40                 lea    (%rax,%rax,2),%eax
40059b:    89 07                    mov    %eax,(%rdi)
40059d:    83 47 04 07              addl   $0x7,0x4(%rdi)
4005a1:    c3                       retq  

So far I think the function is like:

int mystery(struct m_struct *s){
    int i = s->a;
    i = 3*i;
    int j = s->b;
    j += 7;
    return i;
}

But this isn't correct. I don't understand what mov %eax,(%rdi)does exactly and what the function returns in the end because its supposed to return and integer.

1条回答
我只想做你的唯一
2楼-- · 2019-06-24 10:29

Given that RDI is the pointer to the beginning of the structure (first parameter of function) the following line is getting the value of s->aand placing it in a temporary register EAX.

mov    (%rdi),%eax

Reasonably that might be int x = s->a. This line:

lea    (%rax,%rax,2),%eax

Is the same as multiplying the temp value by 3 since RAX+RAX*2=3*RAX (thus s->a * 3). So the first two lines of assembly could be represented as:

int x = s->a * 3;

The line mov %eax,(%rdi) would be taking the temporary value x and storing it back to s->a so that could be represented as:

s->a = x;

The line addl $0x7,0x4(%rdi) is adding 7 to the value at 4(RDI). 4(RDI) is the address of s->b. This line could be represented as s->b += 7;.

So what is being returned as a value? Since nothing else is done with EAX after the code analyzed above, EAX is is still the value it had earlier when we did x = s->a * 3;. This means that the function is returning the temporary value x.

The code then would look like this:

int mystery(struct my_struct *s)
{
    int x = s->a * 3;
    s->a = x;
    s->b += 7;
    return x;    
}

If you compile this code with GCC 4.9.x on godbolt with -O1 optimization level we get this generated assembly:

mystery:
        movl    (%rdi), %eax
        leal    (%rax,%rax,2), %eax
        movl    %eax, (%rdi)
        addl    $7, 4(%rdi)
        ret

Different compilers with different optimizations levels will produce different assembly that will all do the same thing. GCC 4.9.x just so happens to produce the exact assembly code we originally reverse engineered.


Note: I guessed on the version of compiler and optimization level because of a recent SO question with a different mystery function where I had found GCC 4.9.x with optimization level -O1 generated the exact code I was looking for. It seems whoever generated the assembly files for these mystery exercises was using such settings and similar compiler.

查看更多
登录 后发表回答