可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Problem

I wish to inject an object file into an existing binary. As a concrete example, consider a source Hello.c:

#include <stdlib.h>

int main(void)
{
    return EXIT_SUCCESS;
}

It can be compiled to an executable named Hello through gcc -std=gnu99 -Wall Hello.c -o Hello. Furthermore, now consider Embed.c:

func1(void)
{
}

An object file Embed.o can be created from this through gcc -c Embed.c. My question is how to generically insert Embed.o into Hello in such a way that the necessary relocations are performed, and the appropriate ELF internal tables (e.g. symbol table, PLT, etc.) are patched properly?

Assumptions

It can be assumed that the object file to be embedded has its dependencies statically linked already. Any dynamic dependencies, such as the C runtime can be assumed to be present also in the target executable.

Current Attempts/Ideas

Use libbfd to copy sections from the object file into the binary. The progress I have made with this is that I can create a new object with the sections from the original binary and the sections from the object file. The problem is that since the object file is relocatable, its sections can not be copied properly to the output without performing the relocations first.
Convert the binary back to an object file and relink with ld. So far I tried using objcopy to perform the conversion objcopy --input elf64-x86-64 --output elf64-x86-64 Hello Hello.o. Evidently this does not work as I intend since ld -o Hello2 Embed.o Hello.o will then result in ld: error: Hello.o: unsupported ELF file type 2. I guess this should be expected though since Hello is not an object file.
Find an existing tool which performs this sort of insertion?

Rationale (Optional Read)

I am making a static executable editor, where the vision is to allow the instrumentation of arbitrary user-defined routines into an existing binary. This will work in two steps:

The injection of an object file (containing the user-defined routines) into the binary. This is a mandatory step and can not be worked around by alternatives such as injection of a shared object instead.
Performing static analysis on the new binary and using this to statically detour routines from the original code to the newly added code.

I have, for the most part, already completed the work necessary for step 2, but I am having trouble with the injection of the object file. The problem is definitely solvable given that other tools use the same method of object injection (e.g. EEL).

回答1:

If it were me, I'd look to create Embed.c into a shared object, libembed.so, like so:

gcc -Wall -shared -fPIC -o libembed.so Embed.c

That should created a relocatable shared object from Embed.c. With that, you can force your target binary to load this shared object by setting the environment variable LD_PRELOAD when running it (see more information here):

LD_PRELOAD=/path/to/libembed.so Hello

The "trick" here will be to figure out how to do your instrumentation, especially considering it's a static executable. There, I can't help you, but this is one way to have code present in a process' memory space. You'll probably want to do some sort of initialization in a constructor, which you can do with an attribute (if you're using gcc, at least):

void __attribute__ ((constructor)) my_init()
{
    // put code here!
}

回答2:

Assuming source code for first executable is available and is compiled with a linker script that allocates space for later object file(s), there is a relatively simpler solution. Since I am currently working on an ARM project examples below are compiled with the GNU ARM cross-compiler.

Primary source code file, hello.c

#include <stdio.h>

int main ()
{

   return 0;
}

is built with a simple linker script allocating space for an object to be embedded later:

SECTIONS
{
    .text :
    {
        KEEP (*(embed)) ;

        *(.text .text*) ;
    }
}

Like:

arm-none-eabi-gcc -nostartfiles -Ttest.ld -o hello hello.c
readelf -s hello

Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 SECTION LOCAL  DEFAULT    1 
 2: 00000000     0 SECTION LOCAL  DEFAULT    2 
 3: 00000000     0 SECTION LOCAL  DEFAULT    3 
 4: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
 5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
 6: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
 7: 00000000    28 FUNC    GLOBAL DEFAULT    1 main

Now lets compile the object to be embedded whose source is in embed.c

void func1()
{
   /* Something useful here */
}

Recompile with the same linker script this time inserting new symbols:

arm-none-eabi-gcc -c embed.c
arm-none-eabi-gcc -nostartfiles -Ttest.ld -o new_hello hello embed.o

See the results:

readelf -s new_hello
Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 SECTION LOCAL  DEFAULT    1 
 2: 00000000     0 SECTION LOCAL  DEFAULT    2 
 3: 00000000     0 SECTION LOCAL  DEFAULT    3 
 4: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
 5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
 6: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
 7: 00000000     0 FILE    LOCAL  DEFAULT  ABS embed.c
 8: 0000001c     0 NOTYPE  LOCAL  DEFAULT    1 $a
 9: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
10: 0000001c    20 FUNC    GLOBAL DEFAULT    1 func1
11: 00000000    28 FUNC    GLOBAL DEFAULT    1 main

回答3:

The problem is that .o's are not fully linked yet, and most references are still symbolic. Binaries (shared libraries and executables) are one step closer to finally linked code.

Doing the linking step to a shared lib, doesn't mean you must load it via the dynamic lib loader. The suggestion is more that an own loader for a binary or shared lib might be simpler than for .o.

Another possibility would be to customize that linking process yourself and call the linker and link it to be loaded on some fixed address. You might also look at the preparation of e.g. bootloaders, which also involve a basic linking step to do exactly this (fixate a piece of code to a known loading address).

If you don't link to a fixed address, and want to relocate runtime you will have to write a basic linker that takes the object file, relocates it to the destination address by doing the appropriate fixups.

I assume you already have it, seeing it is your master thesis, but this book: http://www.iecc.com/linker/ is the standard introduction about this.

回答4:

You must make room for the relocatable code to fit in the executable by extending the executables text segment, just like a virus infection. Then after writing the relocatable code into that space, update the symbol table by adding symbols for anything in that relocatable object, and then apply the necessary relocation computations. I've written code that does this pretty well with 32bit ELF's.