I'm studying some kernel code and trying to understand how the data structures are linked together. I know the basic idea of how a scheduler works, and what a PID is. Yet I have no idea what a namespace is in this context, and can't figure out how all of those work together.
I have read some explanations (including parts of O'Reilly "Understanding the Linux Kernel") and understand that it could be that the same PID got to two processes because one has terminated and the ID got reallocated. But I can't figure out how all this is done.
So:
- What is a namespace in this context?
- What is the relation between
task_struct
and pid_namespace
? (I already figured it has to do with pid_t
, but don't know how)
Some references:
- Definition of
pid_namespace
- Definition of
task_struct
- Definition of
upid
(see also pid
just beneath it)
Perhaps these links might help:
- PID namespaces in operation
- A brief introduction to PID namespaces (this one comes from a sysadmin)
After going through the second link it becomes clear that namespaces are a great way to isolate resources. And in any OS, Linux included, processes are one of the most crucial resource there is. In his own words
Yes, that’s it, with this namespace it is possible to restart PID
numbering and get your own “1″ process. This could be seen as a
“chroot” in the process identifier tree. It’s extremely handy when you
need to deal with pids in day to day work and are stuck with 4 digits
numbers…
So you sort of create your own private process tree and then assign it to a specific user and/or to a specific task. Within this tree, the processes need not worry about PIDs conflicting with those outside this 'container'. Hence it is as good as handing over this tree to a different 'root' user altogether. That fine fellow has done a wonderful job of explaining the things with a nice little example to top it off, so I won't repeat it here.
As far as the kernel is concerned, I can give you a few pointers to get you started. I am not an expert here but I hope this should help you to some extent.
This LWN article, describes the older and the newer way of looking at PIDs. In it's own words:
All the PIDs that a task may have are described in the struct pid
. This structure contains the ID value, the list of tasks having this
ID, the reference counter and the hashed list node to be stored in the
hash table for a faster search. A few more words about the lists of
tasks. Basically a task has three PIDs: the process ID (PID), the
process group ID (PGID), and the session ID (SID). The PGID and the
SID may be shared between the tasks, for example, when two or more
tasks belong to the same group, so each group ID addresses more than
one task. With the PID namespaces this structure becomes elastic. Now,
each PID may have several values, with each one being valid in one
namespace. That is, a task may have PID of 1024 in one namespace, and
256 in another. So, the former struct pid
changes. Here is how the
struct pid
looked like before introducing the PID namespaces:
struct pid {
atomic_t count; /* reference counter */
int nr; /* the pid value */
struct hlist_node pid_chain; /* hash chain */
struct hlist_head tasks[PIDTYPE_MAX]; /* lists of tasks */
struct rcu_head rcu; /* RCU helper */
};
And this is how it looks now:
struct upid {
int nr; /* moved from struct pid */
struct pid_namespace *ns; /* the namespace this value
* is visible in */
struct hlist_node pid_chain; /* moved from struct pid */
};
struct pid {
atomic_t count;
struct hlist_head tasks[PIDTYPE_MAX];
struct rcu_head rcu;
int level; /* the number of upids */
struct upid numbers[0];
};
As you can see, the struct upid
now represents the PID value -- it is stored in the hash and has the PID value. To convert the struct pid
to the PID or vice versa one may use a set of helpers like
task_pid_nr()
, pid_nr_ns()
, find_task_by_vpid()
, etc.
Though a bit dated, this information is fair enough to get you started. There's one more important structure that needs mention here. It is struct nsproxy
. This structure is the focal point of all things namespace vis-a-vis the processes to which it is associated. It contains a pointer to the PID namespace that this process's children will use. The PID namespace for the current process is found using task_active_pid_ns
.
Within struct task_struct
, we have a namespace proxy pointer aptly called nsproxy
, which points to this process's struct nsproxy
structure. If you trace the steps needed to create a new process, you can find the relationship(s) between the task_struct
, struct nsproxy
and struct pid
.
A new process in Linux is always forked out from an existing process and it's image later replaced using execve
(or similar functions from the exec family). Thus as part of do_fork
, copy_process
is invoked.
As part of copying the parent process the following important things happen:
task_struct
is first duplicated using dup_task_struct
.
- parent process's namespaces is also copied using
copy_namespaces
. This also creates a new nsproxy
structure for the child and it's nsproxy pointer points to this newly created structure
For a non INIT process (the original global PID aka the first process spawned on boot), a PID
structure is allocated using alloc_pid
which actually allocates a new PID structure for the newly fork
ed process. A short snippet from this function:
nr = alloc_pidmap(tmp);
if(nr<0)
goto out_free;
pid->numbers[i].nr = nr;
pid->numbers[i].ns = tmp;
This populates upid
structure by giving it a new PID as well as the namespace to which it currently belongs.
Further as part of the copy process
function, this newly allocated PID is then linked to the corresponding task_struct
via function pid_nr
i.e. it's global ID (which is the original PID nr as seem from the INIT namespace) is stored in the field pid
in task_struct
.
In the final stages of copy_process
, a link is established between task_struct
and this new pid
structure through the pid_link
field within task_struct
through the function attach_pid
.
Theres a lot more to it, but I hope this should at least give you some headstart.
NOTE: I am referring to the latest (as of now) kernel version viz. 3.17.2.