Having trouble wrapping functions in the linux ker

2019-04-20 12:12发布

I've written a LKM that implements Trusted Path Execution (TPE) into your kernel:

https://github.com/cormander/tpe-lkm

I run into an occasional kernel OOPS (describe at the end of this question) when I define WRAP_SYSCALLS to 1, and am at my wit's end trying to track it down.

A little background:

Since the LSM framework doesn't export its symbols, I had to get creative with how I insert the TPE checking into the running kernel. I wrote a find_symbol_address() function that gives me the address of any function I need, and it works very well. I can call functions like this:

int (*my_printk)(const char *fmt, ...);
my_printk = find_symbol_address("printk");
(*my_printk)("Hello, world!\n");

And it works fine. I use this method to locate the security_file_mmap, security_file_mprotect, and security_bprm_check functions.

I then overwrite those functions with an asm jump to my function to do the TPE check. The problem is, the currently loaded LSM will no longer execute the code for it's hook to that function, because it's been totally hijacked.

Here is an example of what I do:

int tpe_security_bprm_check(struct linux_binprm *bprm) {

    int ret = 0;

    if (bprm->file) {
            ret = tpe_allow_file(bprm->file);
            if (IS_ERR(ret))
                    goto out;
    }

#if WRAP_SYSCALLS
    stop_my_code(&cs_security_bprm_check);

    ret = cs_security_bprm_check.ptr(bprm);

    start_my_code(&cs_security_bprm_check);
#endif

    out:

    return ret;
}

Notice the section between the #if WRAP_SYSCALLS section (it's defined as 0 by default). If set to 1, the LSM's hook is called because I write the original code back over the asm jump and call that function, but I run into an occasional kernel OOPS with an "invalid opcode":

invalid opcode: 0000 [#1] SMP 
RIP: 0010:[<ffffffff8117b006>]  [<ffffffff8117b006>] security_bprm_check+0x6/0x310

I don't know what the issue is. I've tried several different types of locking methods (see the inside of start/stop_my_code for details) to no avail. To trigger the kernel OOPS, write a simple bash while loop that endlessly starts a backgrounded "ls" command. After a minute or so, it'll happen.

I'm testing this on a RHEL6 kernel, also works on Ubuntu 10.04 LTS (2.6.32 x86_64).

While this method has been the most successful so far, I have tried another method of simply copying the kernel function to a pointer I created with kmalloc but when I try to execute it, I get: kernel tried to execute NX-protected page - exploit attempt? (uid: 0). If anyone can tell me how to kmalloc space and have it marked as executable, that would also help me solve the above problem.

Any help is appreciated!

1条回答
何必那么认真
2楼-- · 2019-04-20 12:55

1.It seems, the beginning of security_bprm_check() is not restored completely before the function is called. The oops happens at security_bprm_check+0x6, i.e. right after the jump you placed there, so it seems, some part of the jump is still there at that moment. I cannot say now why this can happen.

Take a look at the implementation of Kernel Probes (KProbes) on x86, it may give you some hints. See also the description of KProbes for details. KProbes need to patch and restore almost arbitrary pieces of kernel code in a safe way to do their work.

2.Now to the other approach that you mentioned, concerning copying of the function. The following is a bit of a hack and would be frowned upon by the kernel developers but if there is no other way, this might help.

You can allocate memory to copy the functions to from the same area where the memory for the code of the kernel modules is allocated. That area should be executable by default. Again, KProbes use this trick to allocate their detour buffers.

Memory is allocated by module_alloc() function and freed by module_free(). These functions are of course not exported but you can find their addresses in the same way as you do for security_file_mmap(), etc. Just of curiosity, you are using kallsyms_on_each_symbol(), right?

If you allocate memory this way, this could also help avoid another not so obvious problem. On x86-64, the memory address areas available for kmalloc and for the modules' code are located quite far away from each other (see Documentation/x86/x86_64/mm.txt), beyond the reach of any relative jump. If the memory is mapped to the modules' address area, you can use near relative jumps and calls to call the copied functions. A similar problem with RIP-relative addressing is also avoided this way.

EDIT: Note that on x86, if you copy some piece of code to a different memory area and you would like it to run there, some changes in that code may be necessary. At least you need to fixup the relative calls and jumps that transfer control outside of the copied code (e.g. the calls to another function, etc.) as well as the instructions with RIP-relative addressing.

Apart from that, there may be other structures in the code that need to be fixed up. For example, the compiler might have optimized some or even all switch statements to a jump via a table. That is, the addresses of the code blocks for each case are kept in a table in the memory and the switch variable is the index into that table. This way, instead of many comparisons, your module will execute something like jmp <table_start>(%reg, N) (N is the size of a pointer, in bytes). That is, just a jump to an address that is in the appropriate element of the table. Because such tables are created for the code before you copy it, fixup may be necessary otherwise such jumps will take the execution back to the original piece of code rather than the copied one.

查看更多
登录 后发表回答