Can the sys_execve() system call in the Linux kern

2020-03-29 12:21发布

问题:

Shall sys_execve() in kernel level code receive absolute or relative path for the filename parameter?

回答1:

sys_execve can take either absolute or relative paths

Let's verify it in the following ways:

  • experiment with a raw system call
  • read the kernel source
  • run GDB on kernel + QEMU to verify our source analysis

Experiment

main.c

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>

int main(void) {
    syscall(__NR_execve, "../main2.out", NULL, NULL);
}

main2.c

#include <stdio.h>

int main(void) {
    puts("hello main2");
}

Compile and run:

gcc -o main.out main.c
gcc -o ../main2.out main2.c
./main.out

Output:

hello main2

Tested in Ubuntu 16.10.

Kernel source

First, just go into the kernel tree

git grep '"\.\."' fs

We focus on fs because we know that execve is defined there.

This immediately gives results like: https://github.com/torvalds/linux/blob/v4.9/fs/namei.c#L1759 which clearly indicate that he kernel knows about ..:

/*
 * "." and ".." are special - ".." especially so because it has
 * to be able to know about the current root directory and
 * parent relationships.
 */

We then look at the definition of execve https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1869 and the first thing it does is to call getname() on the input path:

SYSCALL_DEFINE3(execve,
        const char __user *, filename,
        const char __user *const __user *, argv,
        const char __user *const __user *, envp)
{
    return do_execve(getname(filename), argv, envp);
}

getname is defined in fs/namei.c, which is the file where the above ".." quote came from.

I haven't bothered to follow the full call path, but I bet that getname it ends up doing .. resolution.

follow_dotdot in the same file looks specially promising.

GDB + QEMU

Reading the source is great, but we can never be sure that the code paths are actually used.

There are two ways to do that:

  • printk, recompile, printk, recompile
  • GDB + QEMU. Setup is a bit rougher, but once done it is pure bliss

First get the setup working as explained at: How to debug the Linux kernel with GDB and QEMU?

Now, we will use two programs:

init.c

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>

int main(void) {
    chdir("d");
    syscall(__NR_execve, "../b.out", NULL, NULL);
}

b.c

#include <unistd.h>
#include <stdio.h>

int main(void) {
    puts("hello");
    sleep(0xFFFFFFFF);
}

And the rootfs file structure should be like:

init
b.out
d/

Once GDB is running, we will do:

b sys_execve
c
x/s filename

Outputs ../b.out, so we know it is the right syscall.

Now the interesting ".." comment we had seen before was in a function called walk_component, so let's see if that is called:

b walk_component
c

And yes, we hit it.

If we read a bit into it, we see a call:

error = handle_dots(nd, nd->last_type);

which sounds promising and does:

static inline int handle_dots(struct nameidata *nd, int type)
{
    if (type == LAST_DOTDOT) {
        if (!nd->root.mnt)
            set_root(nd);
        if (nd->flags & LOOKUP_RCU) {
            return follow_dotdot_rcu(nd);
        } else
            return follow_dotdot(nd);
    }
    return 0;
}

So what is it that sets this type (nd->last_type) to LAST_DOTDOT?

Well, search the source for = LAST_DOTDOT, and we find that link_path_walk is doing it.

And even better: bt says that link_path_walk is a caller, so it will be easy to understand what is going on now.

In link_path_walk, we see:

if (name[0] == '.') switch (hashlen_len(hash_len)) {
    case 2:
        if (name[1] == '.') {
            type = LAST_DOTDOT;

and thus the mistery is solved: ".." was not the check that was being done, which foiled our previous greps!

Instead, the two dots were being checked separately (because . is a subcase).