体验K8S的用户命名空间usernamespace


前言

用户命名空间可以将容器内部的UID GID映射为宿主机非特权用户root的值, 减轻容器权限逃逸带来的问题


一、环境说明

In practice this means you need at least Linux 6.3, as tmpfs started supporting idmap mounts in that version. This is usually needed as several Kubernetes features use tmpfs (the service account token that is mounted by default uses a tmpfs, Secrets use a tmpfs, etc.)

Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs, ext4, xfs, fat, tmpfs, overlayfs.

In addition, the container runtime and its underlying OCI runtime must support user namespaces. The following OCI runtimes offer support:

crun version 1.9 or greater (it’s recommend version 1.13+).
Note:
Many OCI runtimes do not include the support needed for using user namespaces in Linux pods. If you use a managed Kubernetes, or have downloaded it from packages and set it up, it’s likely that nodes in your cluster use a runtime that doesn’t include this support. For example, the most widely used OCI runtime is runc, and version 1.1.z of runc doesn’t support all the features needed by the Kubernetes implementation of user namespaces.

If there is a newer release of runc than 1.1 available for use, check its documentation and release notes for compatibility (look for idmap mounts support in particular, because that is the missing feature).
To use user namespaces with Kubernetes, you also need to use a CRI container runtime to use this feature with Kubernetes pods:

CRI-O: version 1.25 (and later) supports user namespaces for containers.
containerd v1.7 is not compatible with the userns support in Kubernetes v1.27 to v1.30. Kubernetes v1.25 and v1.26 used an earlier implementation that is compatible with containerd v1.7, in terms of userns support. If you are using a version of Kubernetes other than 1.30, check the documentation for that version of Kubernetes for the most relevant information. If there is a newer release of containerd than v1.7 available for use, also check the containerd documentation for compatibility information.

You can see the status of user namespaces support in cri-dockerd tracked in an issue on GitHub.

二、演示环境各个组件版本号

1.操作系统版本号

笔者使用的操作系统是 Rocky Linux 9.4

[root@k3s ~]# cat /etc/os-release 
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"

2.升级内核

官方要求了内核版本号: you need at least Linux 6.3, 第一节有说明.

# Import the public key and install the ELRepo RPM package
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
dnf install https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm
# list the available kernel related packages
dnf --disablerepo="*" --enablerepo="elrepo-kernel" list available
# install the latest mainline stable kernel
dnf --enablerepo=elrepo-kernel install kernel-ml
# By default, the system will set the newly installed kernel as the default version to use and boot with,verify version
grubby --default-kernel
# reboot
shutdown -r now

3.内核版本号验证

代码如下(示例):

[root@k3s ~]# uname -r
Linux k3s 6.9.9-1.el9.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 11 12:42:03 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Rocky的内核升级可以参考升级Rocky9内核为6.x

4.K8S发行版本

为了简化部署K8S, 我们可以采用Minikube \ Kind \ Kubeadm \ K0S \ K3S \ Sealos ; 笔者为了简单, 使用的是K3S
内部kubernetes的版本号为1.26.15, 因为containerd1.7.x版本号目前只能和k8sv1.25 and v1.26兼容, 才能支持usernamespace

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.26.15+k3s1 sh -

三、开启特性门控

1. 1.25-1.26版本特性门控开关

门控开关名称为UserNamespacesStatelessPodsSupport ; 特别注意, k8s1.27版本后, 特性门控开关更名为UserNamespacesSupport. 参考博客地址: 特性门控开关变更

2.编辑K3S的service系统文件(修改部分在最后一行体现)

文件位置可以通过 systemctl status k3s确定

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server --kube-apiserver-arg='feature-gates=UserNamespacesStatelessPodsSupport=true' \

3.重新加载配置文件

 # 重载配置
 systemctl daemon-reload
 # 重启服务
 systemctl restart k3s

四、验证usernaemspace功能

1.编写配置文件user-namespace-stateless.yaml

apiVersion: v1
kind: Pod
metadata:
  name: userns
spec:
  hostUsers: false
  containers:
  - name: shell
    command: ["sleep", "infinity"]
    image: debian

2.创建pod

kubectl apply -f user-namespace-stateless.yaml

3.检查参数

详细校验步骤参考 校验usernamespace特性是否成功开启

# 进入shell环境
kubectl exec -ti userns -- bash
# 读取参数
readlink /proc/self/ns/user
cat /proc/self/uid_map
# 容器内 uid_map 文件的最后一个数字必须是 65536,在主机上它必须是更大的数字

总结

以上就是关于用户命名空间的简单演示, 特别需要注意内核版本号与运行时本身是否支持userns特性, 另外, k8s1.27版本后, 特性门控开关更名为UserNamespacesSupport.

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐