Docker Security

Adapted from Containers & Docker: How Secure are They?

There are three major areas to consider when reviewing Docker security:

评估docker的安全性时,主要考虑3个方面

  • the intrinsic security of containers, as implemented by kernel namespaces and cgroups;  由内核中namespace和cgruoups提供的容器的内在安全
  • the attack surface of the Docker daemon itself;     docker程序本身的抗攻击性
  • the "hardening" security features of the kernel and how they interact with containers.  加固内核安全性来影响容器的安全性

Kernel Namespaces

内核 命名空间

Docker containers are very similar to LXC containers, and they come with the similar security features. When you start a container with docker run, behind the scenes Docker creates a set of namespaces and control groups for the container.

docker容器和lxc容器很相似,他们提供的安全特性也差不多。当你用docker run启动一个容器时,在后台docker 为容器创建了一个namespace和contril groups的集合。

Namespaces provide the first and most straightforward form of isolation: processes running within a container cannot see, and even less affect, processes running in another container, or in the host system.

Namespaces提供了最初也是最直接的隔离,在容器中运行的进程不会被运行在主机上的进程和容器发现,他们之间相互影响也就小了。

Each container also gets its own network stack, meaning that a container doesn't get a privileged access to the sockets or interfaces of another container. Of course, if the host system is setup accordingly, containers can interact with each other through their respective network interfaces — just like they can interact with external hosts. When you specify public ports for your containers or use links then IP traffic is allowed between containers. They can ping each other, send/receive UDP packets, and establish TCP connections, but that can be restricted if necessary. From a network architecture point of view, all containers on a given Docker host are sitting on bridge interfaces. This means that they are just like physical machines connected through a common Ethernet switch; no more, no less.

每个容器都有自己的网络堆栈,他们不能访问其他容器的sockets接口。不过,如果在主机系统上做了相应的设置,他们还是可以像跟主机交互一样的和其他容器交互通信。当你指定公共端口或则使用links来连接2个容器时,他们就可以相互通信了。(相互ping、udp、tcp都没问题,也可以根据需要设定更严格的策略)从网络架构上来看,所有的容器通过主机的网桥接口相互通信,就像物理机器通过物理交换机通信一样。

How mature is the code providing kernel namespaces and private networking? Kernel namespaces were introduced between kernel version 2.6.15 and 2.6.26. This means that since July 2008 (date of the 2.6.26 release, now 5 years ago), namespace code has been exercised and scrutinized on a large number of production systems. And there is more: the design and inspiration for the namespaces code are even older. Namespaces are actually an effort to reimplement the features of OpenVZ in such a way that they could be merged within the mainstream kernel. And OpenVZ was initially released in 2005, so both the design and the implementation are pretty mature.

内核提供的namesapce和私有网络的代码有多成熟?内核namesapce从内核2.6.15之后被引入,距今已经5年了,在很多大型生产系统中被验证。他们的设计和灵感提出的时间更早,openvz项目利用namespace重新封装他们的内核,并合并到主流内核中。openvz最早的版本在2005,所以他们的设计和实现都很成熟。

Control Groups

Control Groups are the other key component of Linux Containers. They implement resource accounting and limiting. They provide a lot of very useful metrics, but they also help to ensure that each container gets its fair share of memory, CPU, disk I/O; and, more importantly, that a single container cannot bring the system down by exhausting one of those resources.

Control Groups 是LXC容器的另外一个关键组件,由它来实现资源的审计和限制。他们提供了很多有用的特性,还可以用来确保每个容器可以公平分享主机的内存、cpu、磁盘IO等资源,更重要的是,它可以保证当一个容器耗尽其中一个资源的时候不会连累主机宕机。

So while they do not play a role in preventing one container from accessing or affecting the data and processes of another container, they are essential to fend off some denial-of-service attacks. They are particularly important on multi-tenant platforms, like public and private PaaS, to guarantee a consistent uptime (and performance) even when some applications start to misbehave.

尽管他们不阻止容器之间相互访问、处理数据和进程,但他们在防止拒绝服务攻击方面是必不可少的。在多用户的平台比如共有或则私有的paas上更加重要,当某些应用程序表现不好的时候,可以保证一直的uptime和性能。

Control Groups have been around for a while as well: the code was started in 2006, and initially merged in kernel 2.6.24.

Control Groups 始于2006年,从2.6.24之后被引入。

Docker Daemon Attack Surface

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.

运行一个容器或则应用程序意味着运行一个docker 服务。docker服务要求root权限,所以你需要了解一些重要的细节。

First of all, only trusted users should be allowed to control your Docker daemon. This is a direct consequence of some powerful Docker features. Specifically, Docker allows you to share a directory between the Docker host and a guest container; and it allows you to do so without limiting the access rights of the container. This means that you can start a container where the /host directory will be the /directory on your host; and the container will be able to alter your host filesystem without any restriction. This sounds crazy? Well, you have to know that all virtualization systems allowing filesystem resource sharing behave the same way. Nothing prevents you from sharing your root filesystem (or even your root block device) with a virtual machine.

首先,确保只有可信的用户可以访问docker服务,因为这会直接导致很严重的后果。因为,docker允许你在主机和容器之间共享文件夹,这就容易让容器突破资源限制。比如当你在启动容器的时候将主机的/映射到容器的/host目录中,那么容器就可以对主机做任何更改了。这听起来很疯狂?不过,你要知道几乎所有虚拟机系统都有在物理主机和虚拟机之间共享资源的限制,所以需要你自己来考虑这一层的安全性。

This has a strong security implication: for example, if you instrument Docker from a web server to provision containers through an API, you should be even more careful than usual with parameter checking, to make sure that a malicious user cannot pass crafted parameters causing Docker to create arbitrary containers.

比如,当你使用一个web api来提供容器创建服务时,要比平常更加注意参数的检查,防止恶意的用户用精心准备的参数来创建带任意参数的容器

For this reason, the REST API endpoint (used by the Docker CLI to communicate with the Docker daemon) changed in Docker 0.5.2, and now uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the latter being prone to cross-site-scripting attacks if you happen to run Docker directly on your local machine, outside of a VM). You can then use traditional UNIX permission checks to limit access to the control socket.

因此,REST API在docker0.5.2之后使用unix socket替代了绑定在127.0.0.1上的tcp socket(后者容易遭受跨站脚本攻击)。现在你可以使用增强的unix sockt权限来限制对控制socket的访问。

You can also expose the REST API over HTTP if you explicitly decide so. However, if you do that, being aware of the above mentioned security implication, you should ensure that it will be reachable only from a trusted network or VPN; or protected with e.g., stunnel and client SSL certificates. You can also secure them with HTTPS and certificates.

你依然可以将REST API发布到http服务上。不过一定要小心确认这里的安全机制,确保只有可信的网络或则vpn或则受保护的stunnel和ssl认证可以对REST API进行访问。还可以使用https和认证HTTPS and certificates.

Recent improvements in Linux namespaces will soon allow to run full-featured containers without root privileges, thanks to the new user namespace. This is covered in detail here. Moreover, this will solve the problem caused by sharing filesystems between host and guest, since the user namespace allows users within containers (including the root user) to be mapped to other users in the host system.

最近改进的linux namespace将很快可以实现使用非root用户来运行全功能的容器。这解决了因在容器和主机共享文件系统而引起的安全问题。

The end goal for Docker is therefore to implement two additional security improvements: docker的终极目标是改进2个安全特性

  • map the root user of a container to a non-root user of the Docker host, to mitigate the effects of a container-to-host privilege escalation;将root用户的容器映射到主机上的非root用户,减轻容器和主机之间因权限提升而引起的安全问题
  • allow the Docker daemon to run without root privileges, and delegate operations requiring those privileges to well-audited sub-processes, each with its own (very limited) scope: virtual network setup, filesystem management, etc.允许docker服务在非root权限下运行,委派操作请求到那些经过良好审计的子进程,每个子进程拥有非常有限的权限:虚拟网络设定,文件系统管理、配置等等。

Finally, if you run Docker on a server, it is recommended to run  Docker in the server, and move all other services within containers controlled by Docker. Of course, it is fine to keep your favorite admin tools (probably at least an SSH server), as well as existing monitoring/supervision processes (e.g., NRPE, collectd, etc).

最后,如果你在一个服务器上运行docker,建议去掉docker之外的其他服务,除了一些管理服务比如ssh 监控和进程管理工具nrpe clllectd等等。

Linux Kernel Capabilities

By default, Docker starts containers with a very restricted set of capabilities. What does that mean?

默认情况下,docker启动的容器只严格使用一部分内核capabilities。这代表什么呢?

Capabilities turn the binary "root/non-root" dichotomy into a fine-grained access control system. Processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the net_bind_service capability instead. And there are many other capabilities, for almost all the specific areas where root privileges are usually needed.

这是一个root或非root 二分法粒度管理的访问控制系统。比如web服务进程只需要绑定一个低于1024的端口,不需要用root来允许:那么它只需要给它授权net_bind_service功能就可以了。还有很多其他的capabilities,几乎所有需要root权限的仅需要指定一个部分capabilities就可以了。

This means a lot for container security; let's see why!

这对容器的安全有很多好处:

Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include SSH, cron, syslogd; hardware management tools (e.g., load modules), network configuration tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is very different, because almost all of those tasks are handled by the infrastructure around the container:

通常的服务器需要允许一大堆root进程,通常有ssh cron syslogd;模块和网络配置工具等等。容器则不同,因为大部分这种人物都被容器外面的基础设施处理了。

  • SSH access will typically be managed by a single server running in the Docker host; ssh可以被主机上ssh服务替代
  • cron, when necessary, should run as a user process, dedicated and tailored for the app that needs its scheduling service, rather than as a platform-wide facility; 
  • log management will also typically be handed to Docker, or by third-party services like Loggly or Splunk;
  • hardware management is irrelevant, meaning that you never need to run udevd or equivalent daemons within containers;硬件管理也无关紧要,容器中也就无需执行udevd或则其他类似的服务
  • network management happens outside of the containers, enforcing separation of concerns as much as possible, meaning that a container should never need to perform ifconfigroute, or ip commands (except when a container is specifically engineered to behave like a router or firewall, of course).网络管理也都在主机上设置,除非特殊需求,ifconfig、route、ip也不需要了。

This means that in most cases, containers will not need "real" root privileges at all. And therefore, containers can run with a reduced capability set; meaning that "root" within a container has much less privileges than the real "root". For instance, it is possible to:

这意味这大部分情况下,容器完全不需要“真正的”root权限。因此,容器可以运行一个减少的capabilities集,容器中的root也比“真正的root"拥有更少的capabilities,比如:

  • deny all "mount" operations; 完全禁止任何mount操作
  • deny access to raw sockets (to prevent packet spoofing);      禁止访问络socket 
  • deny access to some filesystem operations, like creating new device nodes, changing the owner of files, or altering attributes (including the immutable flag);禁止访问一些文件系统的操作,比如创建新的设备node等等
  • deny module loading; 禁止模块加载
  • and many others.还有一些其他的

This means that even if an intruder manages to escalate to root within a container, it will be much harder to do serious damage, or to escalate to the host.

这意味这就算攻击者在容器中取得了root权限,他能做的破坏也少了,也不能获得主机的更高权限。

This won't affect regular web apps; but malicious users will find that the arsenal at their disposal has shrunk considerably! By default Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach. You can see a full list of available capabilities in Linux manpages.

这不会影响普通的web apps,恶意的用户会想各种办法来对你!默认情况下,docker丢弃了它需要的功能之外的其余部分,白名单和黑名单,在 Linux manpages可以看到完整的清单列表

Of course, you can always enable extra capabilities if you really need them (for instance, if you want to use a FUSE-based filesystem), but by default, Docker containers use only a whitelist of kernel capabilities by default.

当然,你还可以启用你需要的额外capabilities。默认docker容器仅使用白名单的内核capabilities。

Other Kernel Security Features

其他内核安全特性

Capabilities are just one of the many security features provided by modern Linux kernels. It is also possible to leverage existing, well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with Docker.

Capabilities是现代linux内核提供的诸多安全特性中的一个,docker可以利用现有的如TOMOYO, AppArmor, SELinux, GRSEC来增强安全性。

While Docker currently only enables capabilities, it doesn't interfere with the other systems. This means that there are many different ways to harden a Docker host. Here are a few examples.

为什么docker当前只启用capabilities,而不介入其他系统。这意味这还可以有很多方法来加固docker主机,下面是一些例子。

  • You can run a kernel with GRSEC and PAX. This will add many safety checks, both at compile-time and run-time; it will also defeat many exploits, thanks to techniques like address randomization. It doesn't require Docker-specific configuration, since those security features apply system-wide, independently of containers.你可以在内核中加载GRSEC和PAX,这会增加很多安全检查。
  • If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, we ship a template that works with AppArmor and Red Hat comes with SELinux policies for Docker. These templates provide an extra safety net (even though it overlaps greatly with capabilities).你可以使用一些有增强安全特性的发行版的模板,比如带apparmor的模板和redhat系列带selinux dcoker策略,这些模板提供了额外的安全特性。
  • You can define your own policies using your favorite access control mechanism.使用你自己喜欢的访问控制机制来定义你自己的安全策略。

Just like there are many third-party tools to augment Docker containers with e.g., special network topologies or shared filesystems, you can expect to see tools to harden existing Docker containers without affecting Docker's core.

想其他添加到docker容器的第三方工具一样(比如网络拓扑和文件系统共享),有很多这样的工具,利用他们可以不用改变docker内核就可以加固现有的docker容器

Conclusions

结论

Docker containers are, by default, quite secure; especially if you take care of running your processes inside the containers as non-privileged users (i.e. non-root).

docker容器默认还是比较安全的,特别是你如果注意在容器中使用非root权限来允许进程的话。

You can add an extra layer of safety by enabling Apparmor, SELinux, GRSEC, or your favorite hardening solution.

你还可以添加额外的比如Apparmor, SELinux, GRSEC等你熟悉的加固方法

Last but not least, if you see interesting security features in other containerization systems, you will be able to implement them as well with Docker, since everything is provided by the kernel anyway.

最后,如果你对其他容器系统中的安全特性感兴趣,你也可以在docker中实现它,毕竟,所有的东西都已经在内核中了。

For more context and especially for comparisons with VMs and other container systems, please also see theoriginal blog post.

更多与vm和其他容器系统的比较的详细内容,请看original blog post.


Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐