I was reading a codebreakers journal article on self-modifying code and there was this code snippet:
void Demo(int (*_printf) (const char *,...))
{
_printf("Hello, OSIX!n");
return;
}
int main(int argc, char* argv[])
{
char buff[1000];
int (*_printf) (const char *,...);
int (*_main) (int, char **);
void (*_Demo) (int (*) (const char *,...));
_printf=printf;
int func_len = (unsigned int) _main - (unsigned int) _Demo;
for (int a=0; a<func_len; a++)
buff[a] = ((char *) _Demo)[a];
_Demo = (void (*) (int (*) (const char *,...))) &buff[0];
_Demo(_printf);
return 0;
}
This code supposedly executed Demo() on the stack. I understand most of the code, but the part where they assign 'func_len' confuses me. As far as i can tell, they're subtracting one random pointer address from another random pointer address.
Someone care to explain?
This code uses uninitialized variables
_main
and_Demo
, so it cannot work in general. Even if they meant something different, they probably assumed some specific ordering of functions in memory.My opinion: don't trust this article.
The code is relying on knowledge of the layout of functions from the compiler - which may not be reliable with other compilers.
The
func_len
line, once corrected to include the-
that was originally missing, determines the length of the functionDemo
by subtracting the address in_Demo
(which is is supposed to contain the start address ofDemo()
) from the address in_main
(which is supposed to contain the start address ofmain()
). This is presumed to be the length of the functionDemo
, which is then copied byte-wise into the bufferbuff
. The address ofbuff
is then coerced into a function pointer and the function then called. However, since neither_Demo
nor_main
is actually initialized, the code is buggy in the extreme. Also, it is not clear that anunsigned int
is big enough to hold pointers accurately; the cast should probably be to auintptr_t
from<stdint.h>
or<inttypes.h>
.This works if the bugs are fixed, if the assumptions about the code layout are correct, if the code is position-independent code, and if there are no protections against executing data space. It is unreliable, non-portable and not recommended. But it does illustrate, if it works, that code and data are very similar.
I remember pulling a similar stunt between two processes, copying a function from one program into shared memory, and then having the other program execute that function from shared memory. It was about a quarter of a century ago, but the technique was similar and 'worked' for the machine it was tried on. I've never needed to use the technique since, thank goodness!