linux capability详解与容器中的capability

linux capability详解capability概述查看当前用户的权限进程的权限在进程内部进行用户切换（进程内调用setuid和setgid）测试内核代码文件权限查看某个文件的权限为某个文件赋权进程创建子进程的时候的权限capability概述在许多文章中都有讲到这部分，本文不做过多解释。自行百度。capabilities(7) — Linux manual page——官方权威！！！Li

ImSEten

4743人浏览 · 2021-09-29 16:47:24

ImSEten · 2021-09-29 16:47:24 发布

一、capability概述

在许多文章中都有讲到这部分，本文不做过多解释。自行百度。

capabilities(7) — Linux manual page——官方权威！！！
Linux Capabilities 入门教程：概念篇——米开朗基杨
 Linux Capabilities 入门教程：基础实战篇——米开朗基杨
 Linux Capabilities 入门教程：进阶实战篇——米开朗基杨
 Linux capability详解——弥敦道人-CSDN

在Linux内核2.2之前，为了检查进程权限，将进程区分为两类：特权进程(euid=0)和非特权进程。特权进程(通常为带有suid的程序)可以获取完整的root权限来对系统进行操作。

在linux内核2.2之后引入了capabilities机制，来对root权限进行更加细粒度的划分。如果进程不是特权进程，而且也没有root的有效id，系统就会去检查进程的capabilities，来确认该进程是否有执行特权操作的的权限。

可以通过man capabilities来查看具体的capabilities。

linux一共由5种权限集合。

Permitted ——可以赋予别人的权限。在下文中用大写P简称该权限
Effective ——当前有限的权限（真正实行权限的东西）。在下文中用大写E简称该权限
Inheritable ——可继承的权限。在下文中用大写I简称该权限
Bounding ——边界权限。在下文中用大写B简称该权限
Ambient——环境权限。在下文中用大写A简称该权限

1.1 查看当前用户的权限

查看/proc/$$/status文件中的Cap部分

普通用户

ubuntu@ubuntu-standard-pc:~$ cat /proc/$$/status | grep Cap
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	000001ffffffffff
CapAmb:	0000000000000000

root用户

root@ubuntu-standard-pc:~# cat /proc/$$/status | grep Cap
CapInh:	0000000000000000
CapPrm:	000001ffffffffff
CapEff:	000001ffffffffff
CapBnd:	000001ffffffffff
CapAmb:	0000000000000000

CapInh对应上文的I
CapPrm对应上文的P
CapEff对应上文的E
CapBnd对应上文的B
CapAmb对应上文的A

1.2 进程的权限

下文中的进程权限用pP、pI、pE、pB、pA来分别对应进程的P、I、E、B、A

首先创建一个进程，sleep进程。sleep 100秒。并且在后台运行。（末尾 &表示后台运行）

ubuntu@ubuntu-standard-pc:~$ sleep 100 &
[1] 1968

可以看到该进程的pid为1968，查看该进程的状态，（位置在/proc/"pid"/status）抓取capability部分。
/proc/pid号/status中记录了该pid进程的状态，包括了该进程的权限（capability）

如果不知道进程号，可以使用ps -ef命令来输出所有的进程，然后通过grep命令来搜索想要的信息。
例如本例子中，则可以

ubuntu@ubuntu-standard-pc:~$ ps -ef | head -1; ps -ef | grep sleep
UID        PID  PPID  C STIME TTY          TIME CMD
root      1595  1638  0 10:35 ?        00:00:00 sleep 60
1030775+  1968  1896  0 10:35 pts/1    00:00:00 sleep 100
1030775+  2065 59302  0 10:35 ?        00:00:00 sleep 5
1030775+  2175  1896  0 10:35 pts/1    00:00:00 grep --color=auto sleep

head -1的意思是输出表头，就是UID PID PPID C STIME TTY TIME CMD那一行。

ubuntu@ubuntu-standard-pc:~$ cat /proc/1968/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

可以看到，该进程的只有B有权限，其他所有集合均没有权限。与该用户的权限是一致的。至于为什么，下文会说。（不是简单的全部复制过来哦~）

我们继续看root用户的。

ubuntu@ubuntu-standard-pc:~$ sudo -i
root@ubuntu-standard-pc:~# cat /proc/$$/status | grep Cap
CapInh:	0000000000000000
CapPrm:	000001ffffffffff
CapEff:	000001ffffffffff
CapBnd:	000001ffffffffff
CapAmb:	0000000000000000

可以看到root用户的权限，只有I和A没有，其他权限都有。与root用户本身的权限是一致的。至于为什么，下文会说。（同样不是简单的全部复制过来哦~）

1.3 在进程内部进行用户切换（进程内调用setuid和setgid）

当一个进程在执行过程中发生用户切换的时候（在进程的执行代码中，调用了系统调用setuid和setgid）那么进程的capability也会发生相应的变化。
内核代码阅读——一定要收藏啊啊啊！！！

在内核中处理这部分的代码如下：

内核代码位置/security/commoncap.c：1087行

static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
{
	kuid_t root_uid = make_kuid(old->user_ns, 0);

	if ((uid_eq(old->uid, root_uid) ||
	     uid_eq(old->euid, root_uid) ||
	     uid_eq(old->suid, root_uid)) 				//这3个，表示进程原来的用户是root用户
	     &&
	    (!uid_eq(new->uid, root_uid) &&
	     !uid_eq(new->euid, root_uid) &&
	     !uid_eq(new->suid, root_uid))) 			//这3个，表示进程限制的用户不是root用户
	     {
		if (!issecure(SECURE_KEEP_CAPS)) {			//如果没有设置KEEP_CAPS标志，则清除P和E权限集合
			cap_clear(new->cap_permitted);
			cap_clear(new->cap_effective);
		}

		/*
		 * Pre-ambient programs expect setresuid to nonroot followed
		 * by exec to drop capabilities.  We should make sure that
		 * this remains the case.
		 */
		cap_clear(new->cap_ambient);				//不管是不是root，统统清除A
	}
	if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
		cap_clear(new->cap_effective);				//曾经是root，现在切换成非root，则清除E
	if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
		new->cap_effective = new->cap_permitted;	//曾经是非root，现在切换成root，则E=P
}

上述内核代码主要的功能总结如下：

进程以前是root，切换成非root用户以后。如果没有设置KEEP_CAPS标志，则清除E和P权限集。
如果设置了KEEP_CAPS标志，则保留P权限集。

总而言之，只要发生了从root到普通用户切换，E的权限都会被清除掉，P的权限则视是否设置了KKEP_CAPS标志情况而定。

1.3.1 测试内核代码

本文例子中使用golang编程语言。

代码文件名：setid.go

package main

import (
	"fmt"
	"syscall"
	"time"
)

//SetKeepCaps 表示设置保留权限(capability)标志
func SetKeepCaps() error {
	if _, _, err := syscall.RawSyscall(syscall.SYS_PRCTL, syscall.PR_SET_KEEPCAPS, 1, 0); err != 0 {
		return err
	}

	return nil
}

//ClearKeepCaps 表示设置不保留权限(capability)标志
func ClearKeepCaps() error {
	if _, _, err := syscall.RawSyscall(syscall.SYS_PRCTL, syscall.PR_SET_KEEPCAPS, 0, 0); err != 0 {
		return err
	}

	return nil
}

func main() {

	fmt.Println("Hello world!")
	fmt.Println("before set, the uid is ", syscall.Getuid())
	fmt.Println("before set, the gid is ", syscall.Getgid())
	fmt.Println("before set, the effective uid is ", syscall.Geteuid())

	fmt.Println("|***********************************|")

	if err := SetKeepCaps(); err != nil {
		fmt.Println(err)
		return
	} else {
		fmt.Println("*     secessfully set keep caps     *")
	}
	fmt.Println("|***********************************|")

	syscall.Setgid(1000)
	syscall.Setuid(1000)
	//syscall.Setgid(0)
	//syscall.Setuid(0)
	fmt.Println("after set, the uid is ", syscall.Getuid())
	fmt.Println("after set, the gid is ", syscall.Getgid())
	fmt.Println("after set, the effective uid is ", syscall.Geteuid())

	// if err := ClearKeepCaps(); err != nil {
	// 	return
	// }
	// fmt.Println("after Clear, the uid is ", syscall.Getuid())
	// fmt.Println("after Clear, the gid is ", syscall.Getgid())
	time.Sleep(100 * time.Second)
}

上述代码实现的功能：

首先，设置KEEP_CAPS标志
在程序内部调用setgid和setuid系统调用，完成子进程的用户切换，从root用户切换到普通用户
程序休眠100s，在这个时间内，可以用ps命令查找该程序，查看该程序的权限capability

使用方法：

#bash命令
ubuntu@ubuntu-standard-pc:~/codes/go/capability$ go build setid.go

使用go build命令生成可执行文件，文件名为setid，没有后缀

然后使用root用户执行setid，这个setid可执行文件则是从root切换到1000用户上。

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ sudo ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000

可以看到程序运行正常。ctrl+C退出程序，重新以后台运行的方式运行程序

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ sudo ./setid &
[1] 3778
ubuntu@ubuntu-standard-pc:~/codes/go/capability$ Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000

运行以后再敲以下回车！

该程序的pid为54152，去/proc/3778/status 文件中查找权限（Cap）。

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ cat /proc/3778/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000

可以看到，该程序从root用户切换到普通用户以后，权限（capability）只有P和B，E被内核清理了。与内核代码一致。

这里的权限（capability）是进程的权限，保留的是pP和pB。

用户	是否设置KEEP_CAPS	切换前权限集合	切换后权限集合
root->root	是/否	E、I、P、B、A	E、I、P、B、A(不清除)
root->普通	是	E、I、P、B、A	I、P、B(清除E、A)
root->普通	否	E、I、P、B、A	I、B(清除E、P、A)
普通->root	是/否	E、I、P、B、A	E、I、P、B、A(E=P)

1.4 文件权限

文件只用E、I、P权限，没有A、B权限！！！
文件只用E、I、P权限，没有A、B权限！！！
文件只用E、I、P权限，没有A、B权限！！！

1.4.1 查看某个文件的权限

下文中使用fI、fP、fE来分别表示文件的I、P、E权限

每个文件同样有权限，这些权限决定了某个用户执行该文件时可以进行哪些敏感操作。一般是看可执行文件的权限。

例如，我们的终端就是一个可执行文件，位置是/bin/bash。可以去查看该文件的权限。

ubuntu@ubuntu-standard-pc:~$ getcap /bin/bash
ubuntu@ubuntu-standard-pc:~$

可以看到该文件的权限为空。

查看我们刚刚的setid可执行文件的权限：

ubuntu@ubuntu-standard-pc:~/codes/go/capability$ getcap setid
ubuntu@ubuntu-standard-pc:~/codes/go/capability$

可以看到setid可执行文件的文件权限fI、fE、fP也为空

getcap看到的文件权限是普通用户的权限！！！
getcap看到的文件权限是普通用户的权限！！！
getcap看到的文件权限是普通用户的权限！！！
重要的事情说3遍！

对于root用户而言，系统默认为root用户设置的权限为所有权限。即fE、fI、fP均为1。这里的1是指后文进行计算时候使用的1，实际拥有哪些权限还是取决于用户root权限B（边界权限）。（cat /proc/$$/status | grep CapBnd）

root用户下的官方解释

   1. If the real or effective user ID of the process is 0 (root),
      then the file inheritable and permitted sets are ignored;
      instead they are notionally considered to be all ones (i.e.,
      all capabilities enabled).  (There is one exception to this
      behavior, described below in Set-user-ID-root programs that
      have file capabilities.)

   2. If the effective user ID of the process is 0 (root) or the
      file effective bit is in fact enabled, then the file effective
      bit is notionally defined to be one (enabled).

1.4.2 为某个文件赋权

以上文的可执行文件setid为例。setid的普通用户文件权限为空，我们来为setid赋予一点权限。

使用命令setcap来进行赋权。

root@ubuntu-standard-pc:/home/ubuntu/codes/go/capability# setcap CAP_SYS_ADMIN+eip setid
root@ubuntu-standard-pc:/home/ubuntu/codes/go/capability# getcap setid
setid = cap_sys_admin+eip

命令中的+eip(也可以用=eip)表示，在fE集合中添加cap_sys_admin权限，在fI集合中添加cap_sys_admin权限，在fP集合中添加cap_sys_admin权限

可以看到赋权成功，setid可执行文件的E、I、P权限集中都有了cap_sys_admin这个权限。

1.5 进程创建子进程的时候的权限

当我们在一个进程中创建一个子进程的时候，权限就会发生变化。

进程在进行fork()调用的时候，权限不会发生变化，子进程完全继承父进程的权限。

但是进程在进行exec()调用的时候，权限就会发生变化，具体的权限变化规则遵从以下公式：

如果子进程是root用户，则权限变化规则如下：

       p'P = pI | pB

       p'E = p'P
       p'I = pI
       p'B = pB

如果子进程是普通用户，则权限变化规则如下：

       p'A = (file is privileged) ? 0 : pA

       p'P= (pI & fI) | (fP & pB) | p'A
       
       p'E = fE ? p'P : p'A

       p'I = pI

       p'B = pB

capability在docker中

docker runc启动一个容器的过程如下：

先用root用户启动runc init进程，用户为root
然后设置pB，此时的pB已经是docker的默认capability集合了。而其他的pE、pP、pI都还是原本的capability。pA为默认的空
设置KEEP_CAPS标志。保留pP
setuid和gid。此时由root->普通用户，掉权，只剩pB、pI、pP。pE为空。
此时已经是普通用户，重新设置所有权限，pB、pI、pP、pE。此时所有权限都有。
普通用户调用系统调用exec()。掉权。实行普通用户的权限变化规则

这部分代码位置

func finalizeNamespace(config *initConfig) error {
	// Ensure that all unwanted fds we may have accidentally
	// inherited are marked close-on-exec so they stay out of the
	// container
	if err := utils.CloseExecFrom(config.PassedFilesCount + 3); err != nil {
		return err
	}

	capabilities := config.Config.Capabilities
	if config.Capabilities != nil {
		capabilities = config.Capabilities
	}
	w, err := newCapWhitelist(capabilities)
	if err != nil {
		return err
	}
	// drop capabilities in bounding set before changing user
	if err := w.dropBoundingSet(); err != nil {
		return err
	}
	// preserve existing capabilities while we change users
	if err := system.SetKeepCaps(); err != nil {
		return err
	}
	if err := setupUser(config); err != nil {
		return err
	}
	if err := system.ClearKeepCaps(); err != nil {
		return err
	}
	// drop all other capabilities
	if err := w.drop(); err != nil {
		return err
	}
	if config.Cwd != "" {
		if err := syscall.Chdir(config.Cwd); err != nil {
			return fmt.Errorf("chdir to cwd (%q) set in config.json failed: %v", config.Cwd, err)
		}
	}
	return nil
}

设置权限的代码如下：

func (c *capsV3) Set(which CapType, caps ...Cap) {
	for _, what := range caps {
		var i uint
		if what > 31 {
			i = uint(what) >> 5
			what %= 32
		}

		if which&EFFECTIVE != 0 {
			c.data[i].effective |= 1 << uint(what)
		}
		if which&PERMITTED != 0 {
			c.data[i].permitted |= 1 << uint(what)
		}
		if which&INHERITABLE != 0 {
			c.data[i].inheritable |= 1 << uint(what)
		}
		if which&BOUNDING != 0 {
			c.bounds[i] |= 1 << uint(what)
		}
	}
}

runc capability设置中，没有对权限集A进行设置，也没有对权限A进行删除。所以A一直为空。

runc顶层过程代码如下：

func (l *linuxSetnsInit) Init() error {
	if !l.config.Config.NoNewKeyring {
		// do not inherit the parent's session keyring
		if _, err := keys.JoinSessionKeyring(l.getSessionRingName()); err != nil {
			return err
		}
	}
	if l.config.NoNewPrivileges {
		if err := system.Prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); err != nil {
			return err
		}
	}
	if l.config.Config.Seccomp != nil {
		if err := seccomp.InitSeccomp(l.config.Config.Seccomp); err != nil {
			return err
		}
	}
	if err := finalizeNamespace(l.config); err != nil {
		return err
	}
	if err := apparmor.ApplyProfile(l.config.AppArmorProfile); err != nil {
		return err
	}
	if err := label.SetProcessLabel(l.config.ProcessLabel); err != nil {
		return err
	}
	// close the statedir fd before exec because the kernel resets dumpable in the wrong order
	// https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318
	syscall.Close(l.stateDirFD)
	return system.Execv(l.config.Args[0], l.config.Args[0:], os.Environ())
}

调用execv以后，发生掉权。
计算过程：

       p'A = (file is privileged) ? 0 : pA

       p'P= (pI & fI) | (fP & pB) | p'A
       
       p'E = fE ? p'P : p'A

       p'I = pI

       p'B = pB

由于所有f的capability都为0，pA也为0，所以p’A=0。
p’A = 0
p’P = 0
p’E = 0
p’I = pI
p’B = pB

二、 docker 启用userns-remap

2.1 容器内部为root用户

先在主机侧创建用户

groupadd -g 10000 dockeruser
useradd -u 10000 -g dockeruser -d /home/dockeruser -m dockeruser

启用userns-remap

#vim /etc/docker/daemon.json
{
	...
	"userns-remap":"dockeruser",
	...
}

systemctl stop docker
systemctl daemon-reload
systemctl start docker

启用userns-remap以后。

Dockerfile如下：

FROM centos
ADD setid .
RUN setcap cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip setid

docker build

root@ubuntu-standard-pc:~# docker build -t centos:host-root-origin .

docker run

root@ubuntu-standard-pc:~# docker run -it --name centos-host-root-origin centos:host-root-origin /bin/bash

2.1.1 在容器侧的权限

[root@e72c2e81500e /]# capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=

权限与没有开启user-remap一致。且可以进入容器的root目录。

2.1.2 在主机侧的权限

查找docker进程在主机侧的pid

ubuntu@ubuntu-standard-pc:~$ ps -ef | grep e72c2e81500e
root        4253       1  0 23:12 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e72c2e81500ec485c5664216ead98eaf5e7b7fd71b4521d4748ea6e87dbac2a3 -address /run/containerd/containerd.sock
ubuntu      4342    4334  0 23:13 pts/2    00:00:00 grep --color=auto e72c2e81500e
ubuntu@ubuntu-standard-pc:~$ ps -ef | grep 4253
root        4253       1  0 23:12 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e72c2e81500ec485c5664216ead98eaf5e7b7fd71b4521d4748ea6e87dbac2a3 -address /run/containerd/containerd.sock
165536      4274    4253  0 23:12 pts/0    00:00:00 /bin/bash
ubuntu      4344    4334  0 23:13 pts/2    00:00:00 grep --color=auto 4253

查看主机侧docker的权限

ubuntu@ubuntu-standard-pc:~$ cat /proc/4274/status | grep Cap
CapInh:	00000000a80425fb
CapPrm:	00000000a80425fb
CapEff:	00000000a80425fb
CapBnd:	00000000a80425fb
CapAmb:	0000000000000000
ubuntu@ubuntu-standard-pc:~$ capsh --decode=00000000a80425fb
WARNING: libcap needs an update (cap=40 should have a name).
0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap

可以看到，在主机侧，权限与在容器内看到的权限是一致的。

2.2 容器内部为普通用户

Dockerfile

FROM centos
ADD setid .
ADD helloworld .
ADD setid-chmod .
ADD setrootid .
RUN chmod +s setid-chmod
RUN chmod +s setrootid
RUN groupadd -g 20000 dockercentos
RUN useradd -u 20000 -g dockercentos -d /home/dockercentos -m dockercentos
USER dockercentos

docker build

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-usr# docker build -t centos:host-usr-chmod .
Sending build context to Docker daemon  7.139MB
Step 1/10 : FROM centos
 ---> 5d0da3dc9764
Step 2/10 : ADD setid .
 ---> Using cache
 ---> 01aae451ee4a
Step 3/10 : ADD helloworld .
 ---> Using cache
 ---> 91b5dc55ce84
Step 4/10 : ADD setid-chmod .
 ---> Using cache
 ---> a2b67950e1b2
Step 5/10 : ADD setrootid .
 ---> Using cache
 ---> af10aae2cff3
Step 6/10 : RUN chmod +s setid-chmod
 ---> Using cache
 ---> 62b534a30c89
Step 7/10 : RUN chmod +s setrootid
 ---> Using cache
 ---> 9afa21fd8b32
Step 8/10 : RUN groupadd -g 20000 dockercentos
 ---> Running in d233ce98d0f0
Removing intermediate container d233ce98d0f0
 ---> a29952a344b4
Step 9/10 : RUN useradd -u 20000 -g dockercentos -d /home/dockercentos -m dockercentos
 ---> Running in 9bee1a5b9ad6
Removing intermediate container 9bee1a5b9ad6
 ---> 3ccf8e7c7b7d
Step 10/10 : USER dockercentos
 ---> Running in 05feed2c8819
Removing intermediate container 05feed2c8819
 ---> 21b170f459fb
Successfully built 21b170f459fb
Successfully tagged centos:host-usr-chmod

docker run

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-usr# docker run -it --name centos-df-usr centos:host-usr-chmod /bin/bash

2.2.1 在容器侧的权限

[dockercentos@b4b1eccfcd6c /]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=20000(dockercentos)
gid=20000(dockercentos)
groups=

2.2.2 在主机侧的权限

ubuntu@ubuntu-standard-pc:~$ ps -ef | grep b4b1eccfcd6c
root       12587       1  0 00:57 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id b4b1eccfcd6ce1daadbe5fb6059cb9a6fea631f81eaaf0b3ae97ba839e41f64b -address /run/containerd/containerd.sock
ubuntu     12656    8418  0 00:58 pts/2    00:00:00 grep --color=auto b4b1eccfcd6c
ubuntu@ubuntu-standard-pc:~$ ps -ef | grep 12587
root       12587       1  0 00:57 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id b4b1eccfcd6ce1daadbe5fb6059cb9a6fea631f81eaaf0b3ae97ba839e41f64b -address /run/containerd/containerd.sock
185536     12609   12587  0 00:57 pts/0    00:00:00 /bin/bash
ubuntu     12658    8418  0 00:58 pts/2    00:00:00 grep --color=auto 12587

查看主机侧的权限

ubuntu@ubuntu-standard-pc:~$ cat /proc/12609/status | grep Cap
CapInh:	00000000a80425fb
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	00000000a80425fb
CapAmb:	0000000000000000

可以看到，容器内为普通用户的时候，在主机侧的权限只有I和E。

2.3 容器中的chmod

如果容器内部的某个文件，在dockerfile中设置了权限，但是容器本身没有这个权限，则无法运行该文件。
如下：

Dockerfile

FROM centos
ADD setid .
ADD helloworld .
RUN setcap cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip setid
RUN setcap cap_sys_admin+eip helloworld

文件helloworld拥有权限cap_sys_admin，但是容器默认权限中没有该权限。
设置的setid文件的权限=容器默认权限集。

创建docker镜像

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root-cap# docker build -t centos:host-root-captest .
Sending build context to Docker daemon  3.562MB
Step 1/5 : FROM centos
 ---> 5d0da3dc9764
Step 2/5 : ADD setid .
 ---> Using cache
 ---> fff6dba319f3
Step 3/5 : ADD helloworld .
 ---> Using cache
 ---> e22e26214e9d
Step 4/5 : RUN setcap cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip setid
 ---> Running in 2094b4999f5f
Removing intermediate container 2094b4999f5f
 ---> 44ce5251d8b7
Step 5/5 : RUN setcap cap_sys_admin+eip helloworld
 ---> Running in 871c82fa3e98
Removing intermediate container 871c82fa3e98
 ---> d2bfa0175e88
Successfully built d2bfa0175e88
Successfully tagged centos:host-root-captest

运行镜像

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root-cap# docker run -it --rm centos:host-root-captest /bin/bash

[root@339688eb874d /]# ls
bin  dev  etc  helloworld  home  lib  lib64  lost+found  media	mnt  opt  proc	root  run  sbin  setid	srv  sys  tmp  usr  var
[root@339688eb874d /]# ./helloworld 
bash: ./helloworld: Operation not permitted
[root@339688eb874d /]# ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000

可以看到，helloworld文件无权限运行，setid文件有权限运行。

2.3.1 使用–cap-drop和–cap-add配合分配capabili

2.3.1.1 容器内为root用户

Dockerfile如下

FROM centos
ADD setid .
ADD helloworld .
ADD setid-chmod .
ADD setrootid .
RUN chmod +s setid-chmod
RUN chmod +s setrootid

setrootid是setid.go中，把setuid和setgid的值改为0。

docker build

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker build -t centos:host-chmod .
Sending build context to Docker daemon  7.139MB
Step 1/7 : FROM centos
 ---> 5d0da3dc9764
Step 2/7 : ADD setid .
 ---> 01aae451ee4a
Step 3/7 : ADD helloworld .
 ---> 91b5dc55ce84
Step 4/7 : ADD setid-chmod .
 ---> a2b67950e1b2
Step 5/7 : ADD setrootid .
 ---> af10aae2cff3
Step 6/7 : RUN chmod +s setid-chmod
 ---> Running in 5cd11d90e4ee
Removing intermediate container 5cd11d90e4ee
 ---> 62b534a30c89
Step 7/7 : RUN chmod +s setrootid
 ---> Running in 39a322c185dd
Removing intermediate container 39a322c185dd
 ---> 9afa21fd8b32
Successfully built 9afa21fd8b32
Successfully tagged centos:host-chmod

2.3.1.1.1 不使用no-new-privileges

运行docker

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --rm centos:host-chmod /bin/bash
[root@07b6e17cb6d7 /]# ls
bin  etc	 home  lib64	   media  opt	root  sbin   setid-chmod  srv  tmp  var
dev  helloworld  lib   lost+found  mnt	  proc	run   setid  setrootid	  sys  usr
[root@07b6e17cb6d7 /]# ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@07b6e17cb6d7 /]# ./setid-chmod 
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@07b6e17cb6d7 /]# ./setrootid 
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  0
after set, the gid is  0
after set, the effective uid is  0

可以看到，可以使用setuid和setgid等。且容器内部，在seuid以前，实际的euid用户是0，root用户。

2.3.1.1.2 使用no-new-privileges

docker run

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --rm --security-opt=no-new-privileges centos:host-chmod /bin/bash
[root@1c3a94e2c741 /]# ls
bin  etc	 home  lib64	   media  opt	root  sbin   setid-chmod  srv  tmp  var
dev  helloworld  lib   lost+found  mnt	  proc	run   setid  setrootid	  sys  usr
[root@1c3a94e2c741 /]# ./setid
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@1c3a94e2c741 /]# ./setid-chmod 
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
[root@1c3a94e2c741 /]# ./setrootid 
Hello world!
before set, the uid is  0
before set, the gid is  0
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  0
after set, the gid is  0
after set, the effective uid is  0

可以看到，与不使用no-new-privileges效果一样。euid用户依然是0(root用户)

2.3.1.2 容器内为普通用户

2.3.1.2.1 不开启no-new-privileges

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --rm --user 10000:10000 --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} centos:host-chmod /bin/bash
bash-4.4$ ./setid
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000
^C
bash-4.4$ ./setid-chmod 
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  1000
after set, the gid is  1000
after set, the effective uid is  1000
^C
bash-4.4$ ./setrootid 
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  0
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  0
after set, the gid is  0
after set, the effective uid is  0

进入容器的uid和gid都指定了，且都不为root时，进入容器是完全的普通用户，setid可执行文件由于没有进行chmod提权行为，所以没有setuid和setgid的权限，无法进行setuid和setgid操作。
而setid-chmod可执行文件在dockerfile中使用了chmod +s进行提权，使得setid-chmod文件在执行的时候拥有root权限，（euid为0），所以setid-chmod文件可以进行setuid和setgid操作。该文件在容器内为root。
经过了chmod +s提权以后的文件，可以通过调用setuid和setgid，使得该文件(setrootid)可以切换为root用户。

2.3.1.2.2 开启no-new-privileges

docker run

root@ubuntu-standard-pc:/home/ubuntu/docker/Dockerfiles/centos-root# docker run -it --user 10000:10000 --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --rm --security-opt=no-new-privileges centos:host-chmod /bin/bash
bash-4.4$ ls
bin  etc	 home  lib64	   media  opt	root  sbin   setid-chmod  srv  tmp  var
dev  helloworld  lib   lost+found  mnt	  proc	run   setid  setrootid	  sys  usr
bash-4.4$ ./setid
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000
^C
bash-4.4$ ./setid-chmod 
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000
^C
bash-4.4$ ./setrootid 
Hello world!
before set, the uid is  10000
before set, the gid is  10000
before set, the effective uid is  10000
|***********************************|
*     secessfully set keep caps     *
|***********************************|
after set, the uid is  10000
after set, the gid is  10000
after set, the effective uid is  10000

当开启了no-new-privileges时，无法通过chmod提权的方式，让普通用户的进程切换到root用户。
可以看到，当进入容器的uid和gid都指定了，且都不为root时，进入容器是完全的普通用户，没有setuid和setgid的权限，无法进行setuid和setgid操作。

docker run -it --name centos-host-runroot-nonewprivileges --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --security-opt=no-new-privileges centos:host-root-origin /bin/bash

[root@LIN-29076BB8489 centos-root]# docker run -it --name centos-host-root-nonewprivileges --cap-drop all --cap-add={cap_setgid,cap_setuid,cap_setfcap} --security-opt=no-new-privileges centos:host-root-origin /bin/bash
[root@3215a191e737 /]# capsh --print
Current: = cap_setgid,cap_setuid,cap_setfcap+eip
Bounding set =cap_setgid,cap_setuid,cap_sys_admin,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=

在主机侧查看该容器的capability

[root@LIN-29076BB8489 centos-org]# ps -ef | grep 3215a191e737
root      8080     1  0 17:11 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3215a191e73770dd26b393ba99acf183d0380d717da6e2c68e167f554c57a418 -address /run/containerd/containerd.sock
root      9644 58947  0 17:13 pts/1    00:00:00 grep --color=auto 3215a191e737
[root@LIN-29076BB8489 centos-org]# ps -ef | grep 8080
root      8080     1  0 17:11 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3215a191e73770dd26b393ba99acf183d0380d717da6e2c68e167f554c57a418 -address /run/containerd/containerd.sock
100000    8100  8080  0 17:11 pts/0    00:00:00 /bin/bash
root      9697 58947  0 17:13 pts/1    00:00:00 grep --color=auto 8080

47753则是容器主进程/bin/bash的pid。

[root@LIN-29076BB8489 centos-org]# cat /proc/8100/status | grep Cap
CapInh: 00000000800000c0
CapPrm: 00000000800000c0
CapEff: 00000000800000c0
CapBnd: 00000000800000c0
CapAmb: 0000000000000000
[root@LIN-29076BB8489 centos-org]# capsh --decode=00000000800000c0
0x00000000800000c0=cap_setgid,cap_setuid,cap_setfcap

主机侧，容器主进程只有赋予的有限权限。

三、总结

3.1 docker容器启动过程权限变化

容器内如果为普通用户，容器中的权限为
p’A = 0
p’P = 0
p’E = 0
p’I = pI
p’B = pB

3.2 限制容器的权限

限制容器权限的方法有3种。

docker启用userns-remap，将docker中的root用户映射到主机上的普通用户
容器中使用普通用户，docker在启动容器的时候会进行setuid直接掉权，只剩权限集合I，在主机侧无任何权限。
使用cap-drop all 和cap-add指定权限，docker容器只拥有cap-add指定的个别权限。

启用userns-remap

#vim /etc/docker/daemon.json
{
	...	
	"userns-remap":"用户名",
	...
}

通过启用user-remap，将容器内的用户映射到主机的指定用户上，当指定的主机用户不是root用户时，容器内的root用户则映射为主机上的普通用户。

如果没有使用cap-drop，则容器拥有docker默认权限。且容器内为root用户时，有权限集合E、I、P、B。

3.2.1 启用userns-remap

是否开启userns-remap	效果
是	主机侧是`普通用户`，如果需要使用主机侧的某些权限，需要使用cap-add增加容器对应的权限，否则容器只有docker默认的权限
否	主机侧是`root用户`，拥有root用户权限

3.2.2 限制容器内的用户

容器内的用户	效果
普通用户	容器内和主机侧的权限集合都只有`I`，如果要执行任何需要权限的操作，都需要提前在dockerfile中对对应的程序赋权
root用户	容器内和主机侧的权限集合有`E、I、P、B`，如果需要执行需要权限的操作，只要使用cap-add对应权限即可操作，不需要在dockerfile中赋权

3.2.3 使用cap-add和cap-drop

一般使用cap-drop=all来删除docker默认的权限，然后使用cap-add添加自定义的权限。

是否使用cap-drop和cap-add	效果
是	容器拥有的权限只有cap-drop删掉以后，cap-add增加的指定权限，没有其他权限
否	容器拥有的权限docker默认的权限

是否使用–security-opt=no-new-privileges	效果
是	限制容器内通过chmod提权的普通用户进程`(uid != 0, euid = 0)`，无法进行`setuid`等操作，`对root用户无效`
否	容器内通过chmod提权的普通用户进程`(uid != 0, euid = 0)`，可以进行`setuid`等操作，将uid切换为0，可以获取root用户完整权限

4 建议

4.1 指导原则
容器拥有的权限必须>=程序的权限，否则无法运行该程序。

4.1 建议实施方案1
开启userns-remap，容器内部为root用户，使用cap-drop all cap-add={指定权限}，且设置no-new-privileges。

优点：dockerfile中不需要setcap，构建的镜像少一层。

缺点：容器内用户为root用户，在主机侧有E、I、P权限集合，可以进行某些需要权限的操作。

4.2 建议实施方案2
开启userns-remap，容器内部为普通用户，使用cap-drop all cap-add={指定权限}，在dockerfile中setcap，使用no-new-privileges

优点：容器内用户为普通用户，在主机侧只有权限I，如果容器内进程没有在dockerfile中setcap，则无法进行需要权限的操作。

缺点：需要在dockerfile中setcap。构建的镜像多一层，且无法使用提权小程序进行提权。

Linux

更多推荐

网卡速率和双工模式的配置

http://linux.chinaitlab.com/system/792187.html1、mii-tool 配置网络设备协商方式的工具； 1.1 mii-tool 介绍； mii-tool - view, manipulate media-independent interface status （mii-tool 是查看，管理介质的网络接口的状态）

Linux

Linux虚拟文件系统之文件系统卸载（sys_umount())

Linux中卸载文件系统由umount系统调用实现，入口函数为sys_umount()。较于文件系统的安装较为简单，下面是具体的实现。1. /*sys_umont系统调用*/2. SYSCALL_DEFINE2(umount, char __user *, name, int, flags)3. {4.struct path path;

Linux

Linux系统下超级终端Minicom的使用方法（例如：连接交换机，路由器等）转http://baike.baidu.com/view/2911642.htm?fr=ala0_1

Linux系统下超级终端Minicom的使用方法 　　Linux下的Minicom的功能与下的超级终端功能相似，适于在通过超级终端对设备的管理以及对嵌入操作系统的升级，现写出Minicom的使用手册： 　　1．启动minicom 　　以root权限登录系统 　　使用命令 　　minicom –s 则minicom启动，屏

Linux

所有评论(0)

查看更多评论

ImSEten

@weixin_42152531

已为社区贡献1条内容