I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?
char* str = "Hello World\n";
int main(int argc, char* argv) {
printf(str);
FILE * file = fopen(argv, "r+");
fseek(file, 0x1000, SEEK_SET);
fputs("Goodbyewrld\n", file);
fclose(file);
return 0;
}
This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.
EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?
Is there a workaround?
Thanks
If you operating on Windows, I believe it locks the file to prevent it from being modified while its being run. Thats why you often needs to exit a program in order to install an update. The same is not true on a linux system.
It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.
Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.
With other systems you might not even know where the file is.
On Windows, when a program is run the entire
*.exe
file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).
It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g.
VirtualProtect
).At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.
There are non-portable ways to do this on many platforms. In Windows you can do this with
WriteProcessMemory()
, for example. However, in 2010 it's usually a very bad idea to do this. This isn't the days of DOS where you code in assembly and do this to save space. It's very hard to get right, and you're basically asking for stability and security problems. Unless you are doing something very low-level like a debugger I would say don't bother with this, the problems you will introduce are not worth whatever gain you might have.If you are using Windows, you can do the following:
Step-by-Step Example:
VirtualProtect()
on the code pages you want to modify, with thePAGE_WRITECOPY
protection.VirtualProtect()
on the modified code pages, with thePAGE_EXECUTE
protection.FlushInstructionCache()
.For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)
Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the
Hello world!
tolonger Hello World
you'll need to extend the data segment in file).I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.
EDIT: And don't forget about the main reasons the self-modifying code is used:
1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.
2) Performance, something like JIT.
None of them benefits from modifying the executable.