How to change interpreter path and pass command li

2020-07-09 06:41发布

问题:

Here is a minimal example for an "executable" shared library (assumed file name: mini.c):

// Interpreter path is different on some systems
//+definitely different for 32-Bit machines

const char my_interp[] __attribute__((section(".interp"))) 
    = "/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2";

#include <stdio.h>
#include <stdlib.h>

int entry() {
    printf("WooFoo!\n");
    exit (0);
}

If one compiles it with e.g.: gcc -fPIC -o mini.so -shared -Wl,-e,entry mini.c. "Running" the resulting .so will look like this:

confus@confusion:~$ ./mini.so
WooFoo!

My question is now:
How do I have to change the above program to pass command line arguments to a call of the .so-file? An example shell session after the change might e.g. look like this:

confus@confusion:~$ ./mini.so 2 bar
1: WooFoo! bar!
2: WooFoo! bar!
confus@confusion:~$ ./mini.so 3 bla
1: WooFoo! bla!
2: WooFoo! bla!
3: WooFoo! bla!
5: WooFoo! Bar!

It would also be nice to detect on compile time, wheter the target is a 32-Bit or 64-Bit binary to change the interpreter string accordingly. Otherwise one gets a "Accessing a corrupted shared library" warning. Something like:

#ifdef SIXTY_FOUR_BIT
    const char my_interp[] __attribute__((section(".interp"))) = "/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2";
#else
    const char my_interp[] __attribute__((section(".interp"))) = "/lib/ld-linux.so.2";
#endif

Or even better, to detect the appropriate path fully automatically to ensure it is right for the system the library is compiled on.

回答1:

How do I have to change the above program to pass command line arguments to a call of the .so-file?

When you run your shared library, argc and argv will be passed to your entry function on the stack.

The problem is that the calling convention used when you compile your shared library on x86_64 linux is going to be that of the System V AMD64 ABI, which doesn't take arguments on the stack but in registers.

You'll need some ASM glue code that fetches argument from the stack and puts them into the right registers.

Here's a simple .asm file you can save as entry.asm and just link with:

global _entry
extern entry, _GLOBAL_OFFSET_TABLE_

section .text
BITS 64

_entry:
        mov rdi, [rsp]
        mov rsi, rsp
        add rsi, 8
        call .getGOT
.getGOT:
        pop rbx
        add rbx,_GLOBAL_OFFSET_TABLE_+$$-.getGOT wrt ..gotpc
        jmp entry wrt ..plt

That code copies the arguments from the stack into the appropriate registers, and then calls your entry function in a position-independent way.

You can then just write your entry as if it was a regular main function:

// Interpreter path is different on some systems
//+definitely different for 32-Bit machines

const char my_interp[] __attribute__((section(".interp")))
    = "/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2";

#include <stdio.h>
#include <stdlib.h>

int entry(int argc, char* argv[]) {
    printf("WooFoo! Got %d args!\n", argc);
    exit (0);
}

And this is how you would then compile your library:

nasm entry.asm -f elf64
gcc -fPIC -o mini.so -shared -Wl,-e,_entry mini.c entry.o

The advantage is that you won't have inline asm statements mixed with your C code, instead your real entry point is cleanly abstracted away in a start file.

It would also be nice to detect on compile time, wheter the target is a 32-Bit or 64-Bit binary to change the interpreter string accordingly.

Unfortunately, there's no completely clean, reliable way to do that. The best you can do is rely on your preferred compiler having the right defines.

Since you use GCC you can write your C code like this:

#if defined(__x86_64__)
    const char my_interp[] __attribute__((section(".interp")))
        = "/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2";
#elif defined(__i386__)
    const char my_interp[] __attribute__((section(".interp")))
        = "/lib/ld-linux.so.2";
#else
    #error Architecture or compiler not supported
#endif

#include <stdio.h>
#include <stdlib.h>

int entry(int argc, char* argv[]) {
    printf("%d: WooFoo!\n", argc);
    exit (0);
}

And have two different start files.
One for 64bit:

global _entry
extern entry, _GLOBAL_OFFSET_TABLE_

section .text
BITS 64

_entry:
        mov rdi, [rsp]
        mov rsi, rsp
        add rsi, 8
        call .getGOT
.getGOT:
        pop rbx
        add rbx,_GLOBAL_OFFSET_TABLE_+$$-.getGOT wrt ..gotpc
        jmp entry wrt ..plt

And one for 32bit:

global _entry
extern entry, _GLOBAL_OFFSET_TABLE_

section .text
BITS 32

_entry:
        mov edi, [esp]
        mov esi, esp
        add esi, 4
        call .getGOT
.getGOT:
        pop ebx
        add ebx,_GLOBAL_OFFSET_TABLE_+$$-.getGOT wrt ..gotpc
        push edi
        push esi
        jmp entry wrt ..plt

Which means you now have two slightly different ways to compile your library for each target.

For 64bit:

nasm entry.asm -f elf64
gcc -fPIC -o mini.so -shared -Wl,-e,_entry mini.c entry.o -m64

And for 32bit:

nasm entry32.asm -f elf32
gcc -fPIC -o mini.so -shared -Wl,-e,_entry mini.c entry32.o -m32

So to sum it up you now have two start files entry.asm and entry32.asm, a set of defines in your mini.c that picks the right interpreter automatically, and two slightly different ways of compiling your library depending on the target.

So if we really want to go all the way, all that's left is to create a Makefile that detects the right target and builds your library accordingly.
Let's do just that:

ARCH := $(shell getconf LONG_BIT)

all: build_$(ARCH)

build_32:
        nasm entry32.asm -f elf32
        gcc -fPIC -o mini.so -shared -Wl,-e,_entry mini.c entry32.o -m32

build_64:
        nasm entry.asm -f elf64
        gcc -fPIC -o mini.so -shared -Wl,-e,_entry mini.c entry.o -m64

And we're done here. Just run make to build your library and let the magic happen.



回答2:

Add

int argc;
char **argv;

asm("mov 8(%%rbp), %0" : "=&r" (argc));
asm("mov %%rbp, %0\n"
    "add $16, %0"      : "=&r" (argv));

to the top of your entry function. On x86_64 platforms, this will give you access to the arguments.

The LNW article that John Bollinger linked to in the comments explains why this code works. It might interest you why this is not required when you write a normal C program, or rather, why it does not suffice do just give your entry function the two usual int argc, char **argv arguments: The entry point for a C program normally is not the main function, but instead an assembler function by glibc that does some preparations for you - among others fetch the arguments from the stack - and that eventually (via some intermediate functions) calls your main function. Note that this also means that you might experience other problems, since you skip this initialization! For some history, the cdecl wikipedia page, especially on the difference between x86 and x86_64, might be of further interest.