To be simple, we have two similar functions:
void f1()
{
printf("%d", 123);
}
void f2()
{
printf("%d", 124);
}
Now we call f1 in main and it prints 123. When it is compiled, the disassembly of f1
may be like:
08048424 <f1>:
8048424: 55 push %ebp
8048425: 89 e5 mov %esp,%ebp
8048427: 83 ec 18 sub $0x18,%esp
804842a: b8 40 86 04 08 mov $0x8048640,%eax
804842f: c7 44 24 04 7b 00 00 movl $0x7b,0x4(%esp)
8048436: 00
8048437: 89 04 24 mov %eax,(%esp)
804843a: e8 05 ff ff ff call 8048344 <printf@plt>
804843f: c9 leave
8048440: c3 ret
The machine code of f2 is similar to the f1's.
Now I want to replace the f1 with the machine code of f2 at the runtime. I use memcpy(f1, f2, SIZE_OF_F2_MACHINE_CODE). Sure it comes the problem — a segment fault.
Now I want to know if there exists a solution to solve this problem. This is a common C program. As I know, we can use such code below to set page writable in Linux kernel:
int set_page_rw(long unsigned int addr)
{
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
if(pte->pte & ~_PAGE_RW)
pte->pte |= _PAGE_RW
}
but it does not work at normal Linux C programs. Then what works?
I tried to find an answer to you, but failed. What I actually succeeded to do - is only to simplify the questionable code:
And yes, it fails on segmentation fault (in gcc) or on memory access violation (in MS VC).
EDIT:
Actually I succeeded to do what you want
(basing on the answer of Basile Starynkevitch). But only for x86, only in gcc, and only for your specific example. Below are several code examples.
First - the simplified example.
You compile this and launch once. It will fail, but you will know page size. Let's say, it is 4096. Then you compile this example like this:
And it should work.
Output:
Now the advanced example:
You can't use "memcpy()" here (it is commented) because calls to "printf()" inside "f1()" and "f2()" are relative, not absolute. And I could not find how to make them absolute ("neither -fPIC", nor "-fno-PIC" worked in my case). If you don't have relative function calls in "f1()" and "f2()", I believe you can use "memcpy()" (but I did not try).
You should also use alignment of "f1()" to page size (unless you are sure you have enough code before "f1()" starts). If you have gcc 4.3 and higher, you can use attribute (it is commented because I have gcc v4.1.2). If not, you can use that ugly and not reliable "_asm_".
Output:
And, of course, that horrible "if( ((char*) f2)[ i ] == 124 )". It serves to distinguish between what should be replaced (the printed number) and what should not (relative references). Clearly, this is very simplified algorithm. You will have to implement your own, suitable for your task.
Don't overwrite the procedure, overwrite the symbol reference in the symbol table instead. That does require dynamic linkage. Alternatively you can overwrite the call(s) to the function with a call to the other function, but things like
NX
bits may come to stand in your way. Self-modifying code is generally frowned upon.Why do you ask? If your wish is just to eventually be able to call some functions whose code was generated by the same process, you can proceed differently:
typedef
their signature before declaring the pointer, see this answer.Generate the function and get a pointer to it.
You could for example generate a C source file
generated.c
, fork a process, perhaps withsystem("gcc -fPIC -O -shared generated.c -o generated.so");
to compile it, thendlopen("./generated.so", RTLD_GLOBAL)
and get the pointer of the generated function withdlsym
. See dlopen(3) man page for details. FYI, MELT is doing that.You could also generate the machine code of the function in memory (probably obtained with mmap(2) using
PROT_EXEC
flag). Several JIT (just-in-time translation) libraries are available: GNU lightning (quick generation of slow running machine code), myjit, libjit, LLVM (slow generation of optimized machine code), LuaJIT...If you really wish to overwrite some existing function code, you might do that, but it requires a big lot of care and is painful (e.g. because the new function code needs more space than the old one, and also because of relocation issues). Use the mmap(2) and/or mprotect(2) syscalls to get permission for such tricks. But be prepared for debugging nightmares. You may want to script your
gdb
debugger with your python scripts.For kernel modules the story is different. I heard that some network related kernel code (
iptables
perhaps?) may use JIT techniques to generate machine code and run it.