who creates map in BPF

2019-03-31 12:20发布

问题:

After reading man bpf and a few other sources of documentation, I was under impression that a map can be only created by user process. However the following small program seems to magically create bpf map:

struct bpf_map_def SEC("maps") my_map = {
        .type = BPF_MAP_TYPE_ARRAY,
        .key_size = sizeof(u32),
        .value_size = sizeof(long),
        .max_entries = 10,
};

SEC("sockops")
int my_prog(struct bpf_sock_ops *skops)
{
   u32 key = 1;
   long *value;
   ...

   value = bpf_map_lookup_elem(&my_map, &key);
   ...
   return 1;
}

So I load the program with the kernel's tools/bpf/bpftool and also verify that program is loaded:

$ bpftool prog show
1: sock_ops  name my_prog  tag f3a3583cdd82ae8d
        loaded_at Jan 02/18:46  uid 0
        xlated 728B  not jited  memlock 4096B

$ bpftool map show
1: array  name my_map  flags 0x0
        key 4B  value 8B  max_entries 10  memlock 4096B

Of course the map is empty. However, removing bpf_map_lookup_elem from the program results in no map being created.

UPDATE I debugged it with strace and found that in both cases, i.e. with bpf_map_lookup_elem and without it, bpftool does invoke bpf(BPF_MAP_CREATE, ...) and it apparently succeeds. Then, in case of bpf_map_lookup_elem left out, I strace on bpftool map show, and bpf(BPF_MAP_GET_NEXT_ID, ..) immediately returns ENOENT, and it never gets to dump a map. So obviously something is not completing the map creation.

So I wonder if this is expected behavior?

Thanks.

回答1:

As explained by antiduh, and confirmed with your strace checks, bpftool is the user space program creating the maps in this case. It calls function bpf_prog_load() from libbpf (under tools/lib/bpf/), which in turn ends up performing the syscall. Then the program is pinned at the desired location (under a bpf virtual file system mount point), so that it is not unloaded when bpftool returns. Maps are not pinned.

Regarding map creation, the magic bits also take place in libbpf. When bpf_prog_load() is called, libbpf receives the name of the object file as an argument. bpftool does not ask to load this specific program or that specific map; instead, it provides the object file and libbpf has to deal with it. So the functions in libbpf parse this ELF object file, and eventually find a number of sections corresponding to maps and programs. Then it tries to load the first program.

Loading this program includes the following steps:

CHECK_ERR(bpf_object__create_maps(obj), err, out);
CHECK_ERR(bpf_object__relocate(obj), err, out);
CHECK_ERR(bpf_object__load_progs(obj), err, out);

In other words: start by creating all maps we found in the object file. Then perform map relocation (i.e. associate map index to eBPF instructions), and at last load program instructions.

So regarding your question: in both cases, with and without bpf_map_lookup_elem(), maps are created with a bpf(BPF_MAP_CREATE, ...) syscall. After that, relocation happens, and program instructions are adapted to point, if needed, to the newly created maps. Then once all steps are finished and the program is loaded, bpftool exits. The eBPF program should be pinned, and still loaded in the kernel. As far as I understand, if it does use the maps (if bpf_map_lookup_elem() was used), then maps are still referenced by a loaded program, and are kept in the kernel. On the other hand, if the program does not use the maps, then there is nothing more to hold them back, so the maps are destroyed when the file descriptors held by bpftool are closed, when bpftool returns.

So in the end, when bpftool has completed, you have a map loaded in the kernel if the program uses it, but no map if no program would rely on it. Sounds like expected behaviour in my opinion; but please do ping one way or another if you experience strange things with bpftool, I'm one of the guys working on the utility. One last generic observation: maps can also be pinned and remain in the kernel even if no program uses them, should one need to keep them around.



回答2:

I was under impression that a map can be only created by user process.

You're completely right - user programs are the ones that invoke the bpf system call in order to load eBPF programs and create eBPF maps.

And you did just that:

So I load the program with tools/bpf/bpftool and ...

Your bpftool program is the user process that is invoking the bpf syscall, and thus is the user process that is creating the eBPF map.

BPF programs don't have to be unloaded when the user program that created it quits - bpftool likely uses this mechanism.

Some relevant bits from the man page to connect the dots:

A user process can create multiple maps ... and access them via file descriptors.

Generally, eBPF programs are loaded by the user process and automatically unloaded when the process exits. In some cases ... the program will continue to stay alive inside the kernel even after the process that loaded the program exits.

Each eBPF program is a set of instructions that is safe to run until its completion. ... During verification, the kernel increments reference counts for each of the maps that the eBPF program uses, so that the attached maps can't be removed until the program is unloaded.