I was reading a codebreakers journal article on self-modifying code and there was this code snippet:
void Demo(int (*_printf) (const char *,...))
{
_printf("Hello, OSIX!n");
return;
}
int main(int argc, char* argv[])
{
char buff[1000];
int (*_printf) (const char *,...);
int (*_main) (int, char **);
void (*_Demo) (int (*) (const char *,...));
_printf=printf;
int func_len = (unsigned int) _main - (unsigned int) _Demo;
for (int a=0; a<func_len; a++)
buff[a] = ((char *) _Demo)[a];
_Demo = (void (*) (int (*) (const char *,...))) &buff[0];
_Demo(_printf);
return 0;
}
This code supposedly executed Demo() on the stack. I understand most of the code, but the part where they assign 'func_len' confuses me. As far as i can tell, they're subtracting one random pointer address from another random pointer address.
Someone care to explain?
The code is relying on knowledge of the layout of functions from the compiler - which may not be reliable with other compilers.
The func_len
line, once corrected to include the -
that was originally missing, determines the length of the function Demo
by subtracting the address in _Demo
(which is is supposed to contain the start address of Demo()
) from the address in _main
(which is supposed to contain the start address of main()
). This is presumed to be the length of the function Demo
, which is then copied byte-wise into the buffer buff
. The address of buff
is then coerced into a function pointer and the function then called. However, since neither _Demo
nor _main
is actually initialized, the code is buggy in the extreme. Also, it is not clear that an unsigned int
is big enough to hold pointers accurately; the cast should probably be to a uintptr_t
from <stdint.h>
or <inttypes.h>
.
This works if the bugs are fixed, if the assumptions about the code layout are correct, if the code is position-independent code, and if there are no protections against executing data space. It is unreliable, non-portable and not recommended. But it does illustrate, if it works, that code and data are very similar.
I remember pulling a similar stunt between two processes, copying a function from one program into shared memory, and then having the other program execute that function from shared memory. It was about a quarter of a century ago, but the technique was similar and 'worked' for the machine it was tried on. I've never needed to use the technique since, thank goodness!
This code uses uninitialized variables _main
and _Demo
, so it cannot work in general. Even if they meant something different, they probably assumed some specific ordering of functions in memory.
My opinion: don't trust this article.