pid_namespace

1、pid_namespace结构主要的成员pid_namespace实现了容器间pid资源的隔离，容器里面的进程只能看到容器内的pid信息，高层级的pidns可以看到低层级的pidns信息。struct pid_namespace {struct pidmap pidmap[PIDMAP_ENTRIES];int last_pid;struct kmem_cache *pid_cachep;un

zhcy周

2023人浏览 · 2022-03-29 19:06:19

zhcy周 · 2022-03-29 19:06:19 发布

1、pid_namespace结构主要的成员

pid_namespace实现了容器间pid资源的隔离，容器里面的进程只能看到容器内的pid信息，高层级的pidns可以看到低层级的pidns信息。

struct pid_namespace {
	struct pidmap pidmap[PIDMAP_ENTRIES];
	int last_pid;
	struct kmem_cache *pid_cachep;
	unsigned int level;
	struct pid_namespace *parent;
};

pidmap用于分配pid号的位图变量

last_pid上一次分配的pid号

pid_cachep用于分配struct pid的slab缓存

Level 表示pidns的层级

Parent上一级的pidns

2、create_pid_namespace

fork函数创建子进程时带有标志位CLONE_NEWPID创建子进程自己的pidns。

fork
    copy_process
        copy_namespaces
            copy_pid_ns
                create_pid_namespace

struct pid在进程创建的时候都会通过task的pidns的pid_cachep的slab缓存中分配对象，pid_cachep的slab缓存的object的大小是 sizeof(struct pid) + (nr_ids - 1) * sizeof(struct upid)，和进程的pidns所处的层级有关系。

struct pid最重要的成员变量就是numbers数组，它表示一个进程在每个namespace里的id，这里的id就是getpid()所得到的值。Numbers[0]表示最顶层的namespace，level=0，numbers[1]表示level=1的namespace，依此类推。在alloc_pid是从最底层依次从ns的pidmap位图变量中申请pid号。

3、setns

setns进入到某个进程的对应的namespace命名空间下面，以set pid为例：setns系统调用首先create_new_namespaces函数从slab缓存中分配一个struct nsproxy对象，flags为0会拷贝当前进程包括uts、ipc、mnt、pid、net所指向的具体namespace对象，相当于拷贝了一份当前进程自己的ns。

ops->install(new_nsproxy, ei->ns)会执行pidns_install函数，nsproxy->pid_ns = get_pid_ns(new)将struct nsproxy对象的pid_ns指向新的需要进入进程的对应pid ns对象。最后完成当前进程的nsproxy指针的索引值。

总结下setns pid就是新分配一个nsproxy结构，当前进程包括uts、ipc、mnt、pid、net所指向的具体namespace对象，然后将pid_ns指针修改指向新的进程的pid_ns。

4、为什么set pidns后不能pthread_create创建线程

先看下参考文章[3]里说的结论：如果当前进程之前使用了setns加入到新的pid namespace下，如果CLONE_THREAD的话，将会返回EINVAL错误。

pthread_create创建线程函数本质是指向的clone系统调用，clone的flags参数包含了|CLONE_SIGHAND|CLONE_THREAD两个mask，也就是说如果当前进程之前使用了setns加入到新的pid namespace下，无法通过pthread_create创建新的线程。

 EINVAL CLONE_THREAD was specified in the flags mask, but the
              current process previously called unshare(2) with the
              CLONE_NEWPID flag or used setns(2) to reassociate itself
              with a PID namespace.

下面看下linux内核源码具体是怎么执行的，如果flags带有CLONE_SIGHAND的话，会比较当前进程task的pid_ns和task的struct pid 结构里的members里所指的pid_ns是否相等，不想等的话就会返回EINVAL，显然setnd pid后这两个值是不相等的。

copy_process {
    if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
		return ERR_PTR(-EINVAL);

	/*
	 * If the new process will be in a different pid or user namespace
	 * do not allow it to share a thread group or signal handlers or
	 * parent with the forking task.
	 */
	if (clone_flags & CLONE_SIGHAND) {
		if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
		    (task_active_pid_ns(current) !=
				current->nsproxy->pid_ns))
			return ERR_PTR(-EINVAL);
	}
}

进程在setns pid之后，可以先 clone出一个子进程，子进程会继承当前进程的 ns，然后在子进程中可以使用 CLONE_THREAD来创建出线程了。

参考文章：

[1] golang的setns问题记录

[2] namespace 技术

[3] linux org