Network Configuration

TL;DR

When Docker starts, it creates a virtual interface named docker0 on the host machine. It randomly chooses an address and subnet from the private range defined by RFC 1918 that are not in use on the host machine, and assigns it to docker0. Docker made the choice 172.17.42.1/16 when I started it a few minutes ago, for example — a 16-bit netmask providing 65,534 addresses for the host machine and its containers.

 当rocker启动时,会在主机上创建一个docker0的虚拟网卡。他随机挑选RFC1918私有网络中的一段地址给docker0。比如172.17.42.1/16,16位掩码的网段可以拥有65534个地址可以使用,这对主机和容器来说应该足够了。

Note: This document discusses advanced networking configuration and options for Docker. In most cases you won't need this information. If you're looking to get started with a simpler explanation of Docker networking and an introduction to the concept of container linking see the Docker User Guide.

注意:本文介绍docker的高级网络定制配置,一般情况下你不需要知道这些也可以使docker正常工作。简单的网络配置和介绍看上面这个链接就可以了。

But docker0 is no ordinary interface. It is a virtual Ethernet bridge that automatically forwards packets between any other network interfaces that are attached to it. This lets containers communicate both with the host machine and with each other. Every time Docker creates a container, it creates a pair of “peer” interfaces that are like opposite ends of a pipe — a packet sent on one will be received on the other. It gives one of the peers to the container to become its eth0 interface and keeps the other peer, with a unique name like vethAQI2QT, out in the namespace of the host machine. By binding every veth* interface to the docker0 bridge, Docker creates a virtual subnet shared between the host machine and every Docker container.

docker0 不是普通的网卡,他是桥接到其他网卡的虚拟网卡,容器使用它来和主机相互通信。当创建一个docker容器的时候,它就创建了一个对接口,当数据包发送到一个接口时,另外一个接口也可以收到相同的数据包,它们是绑在一起一对孪生接口。这对接口在容器中的那个的名字是eth0,主机上的接口会指定一个唯一的名字,比如vethAQI2QT这样的名字,这种接口名字不再主机的命名空间中。所有的veth*的接口都会桥接到docker0,这样docker就创建了在主机和所有容器之间一个虚拟共享网络。

The remaining sections of this document explain all of the ways that you can use Docker options and — in advanced cases — raw Linux networking commands to tweak, supplement, or entirely replace Docker's default networking configuration.

接下来的部分将介绍在高级场景中,docker所有的网络定制配置。linux的原生命令将调整、补充、甚至替换docker默认的网络配置。

Quick Guide to the Options

快速配置指南

Here is a quick list of the networking-related Docker command-line options, in case it helps you find the section below that you are looking for.

下面是一个跟docker网络相关的命令列表,可以让你快速找到你需要的信息。

Some networking command-line options can only be supplied to the Docker server when it starts up, and cannot be changed once it is running:

一些命令选项只有在docker服务启动的时候才可以执行,而且不能马上生效。

  • -b BRIDGE or --bridge=BRIDGE — see Building your own bridge   桥接

  • --bip=CIDR — see Customizing docker0   定制docker0

  • -H SOCKET... or --host=SOCKET... — This might sound like it would affect container networking, but it actually faces in the other direction: it tells the Docker server over what channels it should be willing to receive commands like “run container” and “stop container.”这看起来会影响docker的网络,但他实际上是指另一方面的内容:它告诉docker从哪个通道来接收run container  stop container这样的命令。

  • --icc=true|false — see Communication between containers         容器之间的通信

  • --ip=IP_ADDRESS — see Binding container ports        绑定容器端口

  • --ip-forward=true|false — see Communication between containers       容器之间的通信

  • --iptables=true|false — see Communication between containers  容器之间的通信

  • --mtu=BYTES — see Customizing docker0   定制docker0

There are two networking options that can be supplied either at startup or when docker run is invoked. When provided at startup, set the default value that docker run will later use if the options are not specified:

下面2个可以在docker服务启动和docker run执行的时候指定,服务启动的时候指定则会为docker run设定默认值,docker run 后面指定可以覆盖默认值。

Finally, several networking options can only be provided when calling docker run because they specify something specific to one container:

最后这些选项只有在docker run后执行,因为它是针对容器的特性内容。

The following sections tackle all of the above topics in an order that moves roughly from simplest to most complex.

下面是上述列表的详细内容介绍,从简单到复杂。

Configuring DNS

配置DNS

How can Docker supply each container with a hostname and DNS configuration, without having to build a custom image with the hostname written inside? Its trick is to overlay three crucial /etc files inside the container with virtual files where it can write fresh information. You can see this by running mount inside a container:

docker没有定制image,是怎么提供容器的主机名和dns配置呢?它的秘诀就是用主机上的3个配置文件来覆盖容器的这3个文件,在容器中使用mount命令可以看到:

$$ mount
...
/dev/disk/by-uuid/1fec...ebdf on /etc/hostname type ext4 ...
/dev/disk/by-uuid/1fec...ebdf on /etc/hosts type ext4 ...
tmpfs on /etc/resolv.conf type tmpfs ...
...

This arrangement allows Docker to do clever things like keep resolv.conf up to date across all containers when the host machine receives new configuration over DHCP later. The exact details of how Docker maintains these files inside the container can change from one Docker version to the next, so you should leave the files themselves alone and use the following Docker options instead.

这种机制可以让主机在从dhcp更新dns信息后,马上更新所有docker容器的dns配置。如果要保持docker中这些文件固定不变,你可以不覆盖容器中的这些配置文件,然后使用下面的选项来配置它们。

Four different options affect container domain name services.

4中配置容器dns服务的方法

  • -h HOSTNAME or --hostname=HOSTNAME — sets the hostname by which the container knows itself. This is written into /etc/hostname, into /etc/hosts as the name of the container's host-facing IP address, and is the name that /bin/bash inside the container will display inside its prompt. But the hostname is not easy to see from outside the container. It will not appear in docker ps nor in the/etc/hosts file of any other container.设定容器的主机名,它会被写到/etc/hostname,/etc/hosts中的ip地址自动写成分配的ip地址,在/bin/bash中显示该主机名。但它不会在docker ps中显示,也不会在其他的容器的/etc/hosts中显示。

  • --link=CONTAINER_NAME:ALIAS — using this option as you run a container gives the new container's /etc/hosts an extra entry named ALIAS that points to the IP address of the container named CONTAINER_NAME. This lets processes inside the new container connect to the hostnameALIAS without having to know its IP. The --link= option is discussed in more detail below, in the section Communication between containers. 这选项会在创建容器的时候添加一个其他容器CONTAINE_NAME的主机名到/etc/hosts文件中,让新容器的进程可以使用主机名ALIAS就可以连接它。--link=会在容器之间的通信中更详细的介绍

  • --dns=IP_ADDRESS... — sets the IP addresses added as server lines to the container's/etc/resolv.conf file. Processes in the container, when confronted with a hostname not in/etc/hosts, will connect to these IP addresses on port 53 looking for name resolution services. 添加dns服务器到容器的/etc/resolv,conf中,让容器用这ip地址来解析所有不在/etc/hosts中的主机名。

  • --dns-search=DOMAIN... — sets the domain names that are searched when a bare unqualified hostname is used inside of the container, by writing search lines into the container's/etc/resolv.conf. When a container process attempts to access host and the search domainexample.com is set, for instance, the DNS logic will not only look up host but alsohost.example.com.设定容器的搜索域,当设定搜索域为.example.com时,会在搜索一个host主机名时,dns不仅搜索host,还会搜索host.example.com

Note that Docker, in the absence of either of the last two options above, will make /etc/resolv.confinside of each container look like the /etc/resolv.conf of the host machine where the dockerdaemon is running. The options then modify this default configuration.

如果没有上述最后2个选项,docker会用主机上的/etc/resolv.conf来配置容器,它是默认配置。

Communication between containers

容器之间的通信

Whether two containers can communicate is governed, at the operating system level, by three factors.

判断2个容器之间是否能够通信,在操作系统层面,取决于3个因素:

  1. Does the network topology even connect the containers' network interfaces? By default Docker will attach all containers to a single docker0 bridge, providing a path for packets to travel between them. See the later sections of this document for other possible topologies.网络拓扑是否连接到容器的网络接口?默认docker会将所有的容器连接到docker0这网桥来提供数据包通信。其他拓扑结构将在稍后的文档中详细介绍。

  2. Is the host machine willing to forward IP packets? This is governed by the ip_forward system parameter. Packets can only pass between containers if this parameter is 1. Usually you will simply leave the Docker server at its default setting --ip-forward=true and Docker will go setip_forward to 1 for you when the server starts up. To check the setting or turn it on manually:主机是否开启ip转发,ip_forward参数为1的时候可以提供数据包转发。通常你只需要为docker 设定 --ip-forward=true,docker 就会在服务启动的时候设定ip_forward参数为1。下面是手工检查并手工设定该参数的方法。

    # Usually not necessary: turning on forwarding,
    # on the host where your Docker server is running
    
    $ cat /proc/sys/net/ipv4/ip_forward
    0
    $ sudo echo 1 > /proc/sys/net/ipv4/ip_forward
    $ cat /proc/sys/net/ipv4/ip_forward
    1
  3. Do your iptables allow this particular connection to be made? Docker will never make changes to your system iptables rules if you set --iptables=false when the daemon starts. Otherwise the Docker server will add a default rule to the FORWARD chain with a blanket ACCEPT policy if you retain the default --icc=true, or else will set the policy to DROP if --icc=false.你的iptables是否允许这条特殊的连接被建立?当docker的设定--iptables=false时,docker不会改变系统的iptables设定,否则它会在--icc=true的时候添加一条默认的ACCEPT策略到 FORWARD链,当--icc=false时,策略为DROP。

Nearly everyone using Docker will want ip_forward to be on, to at least make communication possible between containers. But it is a strategic question whether to leave --icc=true or change it to --icc=false (on Ubuntu, by editing the DOCKER_OPTS variable in /etc/default/docker and restarting the Docker server) so that iptables will protect other containers — and the main host — from having arbitrary ports probed or accessed by a container that gets compromised.

几乎所有的人会开启ip_forward来启用容器间的通信。当时是否要改变icc-true配置是一个战略问题(在ubuntu中,可以在/etc/default/docker编辑DOCKER_OPTS变量,然后重启docker服务来设定),这样iptable就可以防止其他被感染容器特别是主机的恶意端口扫描和访问。

If you choose the most secure setting of --icc=false, then how can containers communicate in those cases where you want them to provide each other services?

当你选择更安全的设定--icc=false后,如何保持你希望的容器之间通信呢?

The answer is the --link=CONTAINER_NAME:ALIAS option, which was mentioned in the previous section because of its effect upon name services. If the Docker daemon is running with both --icc=false and --iptables=true then, when it sees docker run invoked with the --link= option, the Docker server will insert a pair of iptables ACCEPT rules so that the new container can connect to the ports exposed by the other container — the ports that it mentioned in the EXPOSE lines of its Dockerfile. Docker has more documentation on this subject — see the linking Docker containers page for further details.

答案就是--link=CONTAINER_NAME:ALIAS选型,在之前的dns服务设定中提及过。如果docker 使用icc=false and --iptables=true 2个参数,当docker run使用--link=选型时,docker会为2个容器在iptable中参数一对ACCEPT规则,开放的端口取决与dockerfile总的EXPOSE行。docker有更详细的章节来专门讨论这个主题,详见此处(已经翻译完成)http://blog.csdn.net/smallfish1983/article/details/38636851

Note: The value CONTAINER_NAME in --link= must either be an auto-assigned Docker name likestupefied_pare or else the name you assigned with --name= when you ran docker run. It cannot be a hostname, which Docker will not recognize in the context of the --link= option.

注意:--link= 中的CONTAINER_NAME 必须是自动生成的docker名字比如stupefied_pare,或则你用--name参数指定的名字,主机名在--link中不会被识别。

You can run the iptables command on your Docker host to see whether the FORWARD chain has a default policy of ACCEPT or DROP:

你可以使用iptables命令来检查FORWARD链是ACCEPT 还是DROP

# When --icc=false, you should see a DROP rule:当--icc=false时,规则应该是这样


$ sudo iptables -L -n
...
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DROP       all  --  0.0.0.0/0            0.0.0.0/0
...

# When a --link= has been created under --icc=false,
# you should see port-specific ACCEPT rules overriding
# the subsequent DROP policy for all other packets:当添加了--link后,ACCEPT规则被改写了,添加了新的端口和IP规则

$ sudo iptables -L -n
...
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  172.17.0.2           172.17.0.3           tcp spt:80
ACCEPT     tcp  --  172.17.0.3           172.17.0.2           tcp dpt:80
DROP       all  --  0.0.0.0/0            0.0.0.0/0

Note: Docker is careful that its host-wide iptables rules fully expose containers to each other's raw IP addresses, so connections from one container to another should always appear to be originating from the first container's own IP address.

注意:docker对主机级别的的iptable处理非常小心,为了避免完全暴露容器之间的原始ip地址,连接仅现实连接源容器自己的本身的ip。

Binding container ports to the host

绑定一个容器端口到主机

By default Docker containers can make connections to the outside world, but the outside world cannot connect to containers. Each outgoing connection will appear to originate from one of the host machine's own IP addresses thanks to an iptables masquerading rule on the host machine that the Docker server creates when it starts:

默认容器可以建立到外部网络的连接,但是外部网络无法连接到容器。所有到外部的连接源都会被伪装成主机的ip地址,itables的 masquerading来做到这一点。

# You can see that the Docker server creates a
# masquerade rule that let containers connect
# to IP addresses in the outside world:查看主机的masquerading规则

$ sudo iptables -t nat -L -n
...
Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  172.17.0.0/16       !172.17.0.0/16
...

But if you want containers to accept incoming connections, you will need to provide special options when invoking docker run. These options are covered in more detail in the Docker User Guide page. There are two approaches.

当你希望容器接收外部连接时,你需要在docker run执行的时候就指定对应选项,在docker 用户指南页中有详细介绍,2种方法:

First, you can supply -P or --publish-all=true|false to docker run which is a blanket operation that identifies every port with an EXPOSE line in the image's Dockerfile and maps it to a host port somewhere in the range 49000–49900. This tends to be a bit inconvenient, since you then have to run other docker sub-commands to learn which external port a given service was mapped to.

指定-P --publish-all=true|false 选项会映射dockerfile 中expose的所有端口,主机端口在49000-49900中随机挑选。当你的另外一个容器需要学习这个端口时候,很不方便。

More convenient is the -p SPEC or --publish=SPEC option which lets you be explicit about exactly which external port on the Docker server — which can be any port at all, not just those in the 49000–49900 block — you want mapped to which port in the container.

更方便的方法是指定-p SPEC或则 --publish=SPEC,可以指定任意端口从主机映射容器内部

Either way, you should be able to peek at what Docker has accomplished in your network stack by examining your NAT tables.

不管用那种办法,你可以通过查看iptable的 nat表来观察docker 在网络层做了什么操作。

# What your NAT rules might look like when Docker
# is finished setting up a -P forward:使用-P时:

$ iptables -t nat -L -n
...
Chain DOCKER (2 references)
target     prot opt source               destination
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:49153 to:172.17.0.2:80

# What your NAT rules might look like when Docker
# is finished setting up a -p 80:80 forward:使用-p 80:80时:

Chain DOCKER (2 references)
target     prot opt source               destination
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:80 to:172.17.0.2:80

You can see that Docker has exposed these container ports on 0.0.0.0, the wildcard IP address that will match any possible incoming port on the host machine. If you want to be more restrictive and only allow container services to be contacted through a specific external interface on the host machine, you have two choices. When you invoke docker run you can use either -p IP:host_port:container_port or -p IP::port to specify the external interface for one particular binding.

这里你可以看到docker映射了0.0.0.0.意味着接受主机上的所有接口地址,更严格的规则可以通过-p IP:host_port:container_port 或则 -p IP::port 来指定主机上的ip、接口。


Or if you always want Docker port forwards to bind to one specific IP address, you can edit your system-wide Docker server settings (on Ubuntu, by editing DOCKER_OPTS in /etc/default/docker) and add the option --ip=IP_ADDRESS. Remember to restart your Docker server after editing this setting.

或则你希望永久指定需要绑定的主机ip地址,可以 在dcoker 配置中指定--ip=IP_ADDRESS. 记得重启服务。

Again, this topic is covered without all of these low-level networking details in the Docker User Guide document if you would like to use that as your port redirection reference instead.

本节没有介绍初级的设置,初级设置请查看用户指南。

Customizing docker0

定制docker0

By default, the Docker server creates and configures the host system's docker0 interface as an Ethernet bridge inside the Linux kernel that can pass packets back and forth between other physical or virtual network interfaces so that they behave as a single Ethernet network.

docker服务默认会创建一个docker0接口,它在linux内核层面桥接所有物理或则虚拟网卡,这实际上将他们都放到一个物理网络。

Docker configures docker0 with an IP address and netmask so the host machine can both receive and send packets to containers connected to the bridge, and gives it an MTU — the maximum transmission unitor largest packet length that the interface will allow — of either 1,500 bytes or else a more specific value copied from the Docker host's interface that supports its default route. Both are configurable at server startup:

docker指定docker0的ip地址和子网掩码,让主机和容器之间可以通过网桥相互通信,它还给出了MTU-接口允许接收的最大传输单元,通常是1500bytes或则主机网络路由上支持的默认值。这2个都是在服务启动的时候配置。

  • --bip=CIDR — supply a specific IP address and netmask for the docker0 bridge, using standard CIDR notation like 192.168.1.5/24.ip地址加掩码 使用这种格式

  • --mtu=BYTES — override the maximum packet length on docker0. 覆盖默认的coker mut配置

On Ubuntu you would add these to the DOCKER_OPTS setting in /etc/default/docker on your Docker host and restarting the Docker service.

在ubuntu中你可以配置DOCKER_OPTS,然后重启来改变这些参数。

Once you have one or more containers up and running, you can confirm that Docker has properly connected them to the docker0 bridge by running the brctl command on the host machine and looking at the interfaces column of the output. Here is a host with two different containers connected:

当容器启动后,你可以使用brctl来确认他们是否已经连接到docker0网桥。如下:

# Display bridge info

$ sudo brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.3a1d7362b4ee       no              veth65f9
                                                        vethdda6

If the brctl command is not installed on your Docker host, then on Ubuntu you should be able to runsudo apt-get install bridge-utils to install it.

如果brctl命令没安装的话,在ubuntu中你可以使用这个命令来安装。

Finally, the docker0 Ethernet bridge settings are used every time you create a new container. Docker selects a free IP address from the range available on the bridge each time you docker run a new container, and configures the container's eth0 interface with that IP address and the bridge's netmask. The Docker host's own IP address on the bridge is used as the default gateway by which each container reaches the rest of the Internet.

最后,docker0 网桥设置会在每次创建新容器的时候被使用。docker从可用的地址段中选择一个空闲的ip地址给容器的eth0端口,子网掩码使用网桥docker0的配置,docker主机本身的ip作为容器的网关使用。

# The network, as seen from a container

$ sudo docker run -i -t --rm base /bin/bash

$$ ip addr show eth0
24: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 32:6f:e0:35:57:91 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.3/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::306f:e0ff:fe35:5791/64 scope link
       valid_lft forever preferred_lft forever

$$ ip route
default via 172.17.42.1 dev eth0
172.17.0.0/16 dev eth0  proto kernel  scope link  src 172.17.0.3

$$ exit

Remember that the Docker host will not be willing to forward container packets out on to the Internet unless its ip_forward system setting is 1 — see the section above on Communication between containers for details.

转发数据包需要在主机上设定ip_forward参数为1,上文介绍过。

Building your own bridge

创建你自己的桥接

If you want to take Docker out of the business of creating its own Ethernet bridge entirely, you can set up your own bridge before starting Docker and use -b BRIDGE or --bridge=BRIDGE to tell Docker to use your bridge instead. If you already have Docker up and running with its old bridge0 still configured, you will probably want to begin by stopping the service and removing the interface:

如果你希望完全使用自己的桥接设置,你可以在启动docker服务的时候,使用 -b BRIDGE or --bridge=BRIDGE 来告诉docker使用你的网桥。如果服务已经启动,旧的网桥还在使用中,那需要先停止服务,再删除旧的网桥

# Stopping Docker and removing docker0

$ sudo service docker stop
$ sudo ip link set dev docker0 down
$ sudo brctl delbr docker0

Then, before starting the Docker service, create your own bridge and give it whatever configuration you want. Here we will create a simple enough bridge that we really could just have used the options in the previous section to customize docker0, but it will be enough to illustrate the technique.

然后在开启服务前,创建你自己希望的网桥接口,这里建立一个足够简单的网桥配置:

# Create our own bridge 创建自己的网桥

$ sudo brctl addbr bridge0
$ sudo ip addr add 192.168.5.1/24 dev bridge0
$ sudo ip link set dev bridge0 up

# Confirming that our bridge is up and running 确认网桥启动

$ ip addr show bridge0
4: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state UP group default
    link/ether 66:38:d0:0d:76:18 brd ff:ff:ff:ff:ff:ff
    inet 192.168.5.1/24 scope global bridge0
       valid_lft forever preferred_lft forever

# Tell Docker about it and restart (on Ubuntu) 告诉docker桥接设置,并启动docker服务(ubuntu上)

$ echo 'DOCKER_OPTS="-b=bridge0"' >> /etc/default/docker
$ sudo service docker start

The result should be that the Docker server starts successfully and is now prepared to bind containers to the new bridge. After pausing to verify the bridge's configuration, try creating a container — you will see that its IP address is in your new IP address range, which Docker will have auto-detected.

结果会是dicker服务启动成功并绑定容器到新的网桥。之暂停并确认桥接配置之前,新建一个容器,你会看到它的ip是我们的新ip段,docker会自动检测到它。

Just as we learned in the previous section, you can use the brctl show command to see Docker add and remove interfaces from the bridge as you start and stop containers, and can run ip addr and ip route inside a container to see that it has been given an address in the bridge's IP address range and has been told to use the Docker host's IP address on the bridge as its default gateway to the rest of the Internet.

用brctl show可以看到容器启动或则停止后网桥的配置变化,在容器中使用ip a     和ip r  来查看ip地址配置和路由信息。

How Docker networks a container

While Docker is under active development and continues to tweak and improve its network configuration logic, the shell commands in this section are rough equivalents to the steps that Docker takes when configuring networking for each new container.

当docker还处于开发阶段需要调整和改进网络配置逻辑,使用shell命令可以作为简单粗糙的替代docker,来配置新的容器。

Let's review a few basics.

让我们回顾一些基础知识。

To communicate using the Internet Protocol (IP), a machine needs access to at least one network interface at which packets can be sent and received, and a routing table that defines the range of IP addresses reachable through that interface. Network interfaces do not have to be physical devices. In fact, the loloopback interface available on every Linux machine (and inside each Docker container) is entirely virtual — the Linux kernel simply copies loopback packets directly from the sender's memory into the receiver's memory.

机器需要一个网络接口来使用ip发送和接受数据包,路由表来定义如何到达哪些地址段。这里的网络接口可以不是物理接口。事实上,每个linxu机器上的lo环回接口(docker 容器中也有)就是一个完全的linux内核虚拟接口,它直接复制发送缓存中的数据包到接收缓存中。

Docker uses special virtual interfaces to let containers communicate with the host machine — pairs of virtual interfaces called “peers” that are linked inside of the host machine's kernel so that packets can travel between them. They are simple to create, as we will see in a moment.

docker让主机和容器使用特殊的虚拟接口来通信--通信的2端叫“peers“,他们在主机内核中连接在一起,所以能够相互通信。创建他们很简单,前面介绍过了。

The steps with which Docker configures a container are:

docker创建容器的步骤如下:

  1. Create a pair of peer virtual interfaces.创建一对虚拟接口

  2. Give one of them a unique name like veth65f9, keep it inside of the main Docker host, and bind it to docker0 or whatever bridge Docker is supposed to be using.一端使用一个唯一的名字比如veth65f9,另外一端可以桥接到默认的docker0,或则其它你自己指定的桥接网卡。

  3. Toss the other interface over the wall into the new container (which will already have been provided with an lo interface) and rename it to the much prettier name eth0 since, inside of the container's separate and unique network interface namespace, there are no physical interfaces with which this name could collide.主机上的veth65f9这种接口映射到新的新容器中的名称通常是eth0,在容器这个隔离的网络接口命名空间中,它是唯一的,不会有物理接口和它冲突。

  4. Give the container's eth0 a new IP address from within the bridge's range of network addresses, and set its default route to the IP address that the Docker host owns on the bridge.从主机桥接地址段中获取一个空闲地址给eth0使用,并设定默认路由到桥接网卡。

With these steps complete, the container now possesses an eth0 (virtual) network card and will find itself able to communicate with other containers and the rest of the Internet.

完成这些之后,容器就可以使用这eth0虚拟网卡来连接其他容器和其他网络。

You can opt out of the above process for a particular container by giving the --net= option to docker run, which takes four possible values.

你也可以为特殊的容器设定特定的参数,在docker run的时候使用--net,它有4个可选参数:

  • --net=bridge — The default action, that connects the container to the Docker bridge as described above.默认选项,连接到指定网桥。

  • --net=host — Tells Docker to skip placing the container inside of a separate network stack. In essence, this choice tells Docker to not containerize the container's networking! While container processes will still be confined to their own filesystem and process list and resource limits, a quick ip addr command will show you that, network-wise, they live “outside” in the main Docker host and have full access to its network interfaces. Note that this does not let the container reconfigure the host network stack — that would require --privileged=true — but it does let container processes open low-numbered ports like any other root process. It also allows the container to access local network services like D-bus. This can lead to processes in the container being able to do unexpected things like restart your computer. You should use this option with caution.告诉docker不要将容器放到隔离的网络堆栈中。从本质上讲,这个选项告诉docker不要容器化容器的网络!容器进程还是有自己的文件系统、进程列表和资源限制。使用ip addr命令这样的容器处于docker 主机的外部,它有完全的主机接口访问权限。注意它不会让容器重新配置主机的网络堆栈,除非--privileged=true — 但是容器进程可以跟其他root进程一样打开低数字的端口,可以访问本地网络服务比如D-bus。还可以让容器做一些意想不到的事情,比如重启主机。这个选项要非常小心的使用。

  • --net=container:NAME_or_ID — Tells Docker to put this container's processes inside of the network stack that has already been created inside of another container. The new container's processes will be confined to their own filesystem and process list and resource limits, but will share the same IP address and port numbers as the first container, and processes on the two containers will be able to connect to each other over the loopback interface.告诉docker将新容器的进程放到一个已经存在的容器的网络堆栈中,新容器进程有它自己的文件系统、进程列表和资源限制,但它会和那个已经存在的容器共享ip地址和端口,他们之间来可以通过环回接口通信。

  • --net=none — Tells Docker to put the container inside of its own network stack but not to take any steps to configure its network, leaving you free to build any of the custom configurations explored in the last few sections of this document.告诉docker将新容器放到自己的网络堆栈中,但是不要配置它的网络。这可以让你创建任何自定义的配置,本文最后一段将介绍 他们。

To get an idea of the steps that are necessary if you use --net=none as described in that last bullet point, here are the commands that you would run to reach roughly the same configuration as if you had let Docker do all of the configuration:

当使用--net=none时候,如何配置:

# At one shell, start a container and
# leave its shell idle and running 启动一个/bin/bash 指定--net=none

$ sudo docker run -i -t --rm --net=none base /bin/bash
root@63f36fc01b5f:/#

# At another shell, learn the container process ID
# and create its namespace entry in /var/run/netns/
# for the "ip netns" command we will be using below  再创建一个容器,查找这容器的进程id,创建 它的命名空间,后面的ip netns会用到

$ sudo docker inspect -f '{{.State.Pid}}' 63f36fc01b5f
2778
$ pid=2778
$ sudo mkdir -p /var/run/netns
$ sudo ln -s /proc/$pid/ns/net /var/run/netns/$pid     

# Check the bridge's IP address and netmask  检查桥接网卡的ip和子网掩码

$ ip addr show docker0
21: docker0: ...
inet 172.17.42.1/16 scope global docker0
...

# Create a pair of "peer" interfaces A and B,
# bind the A end to the bridge, and bring it up   创建一对”peer“接口A和B,绑定A到网桥,并启用它

$ sudo ip link add A type veth peer name B
$ sudo brctl addif docker0 A
$ sudo ip link set A up

# Place B inside the container's network namespace, 将B放到容器的网络命名空间,命名为eth0,配置一个空闲的ip
# rename to eth0, and activate it with a free IP

$ sudo ip link set B netns $pid
$ sudo ip netns exec $pid ip link set dev B name eth0
$ sudo ip netns exec $pid ip link set eth0 up
$ sudo ip netns exec $pid ip addr add 172.17.42.99/16 dev eth0
$ sudo ip netns exec $pid ip route add default via 172.17.42.1

At this point your container should be able to perform networking operations as usual.

到这里,你又可以想平常一样使用网络了。

When you finally exit the shell and Docker cleans up the container, the network namespace is destroyed along with our virtual eth0 — whose destruction in turn destroys interface A out in the Docker host and automatically un-registers it from the docker0 bridge. So everything gets cleaned up without our having to run any extra commands! Well, almost everything:

当你退出shell后,docker清空容器,容器的eth0随网络命名空间一起被摧毁,A 接口也被自动从docker0取消注册。不用其他命令,所有东西都被清理掉了!

# Clean up dangling symlinks in /var/run/netns

find -L /var/run/netns -type l -delete

Also note that while the script above used modern ip command instead of old deprecated wrappers likeipconfig and route, these older commands would also have worked inside of our container. The ip addr command can be typed as ip a if you are in a hurry.

ip 命令替代了旧的ifconfig route  等命令,ip addr 可以用ip a简写。

Finally, note the importance of the ip netns exec command, which let us reach inside and configure a network namespace as root. The same commands would not have worked if run inside of the container, because part of safe containerization is that Docker strips container processes of the right to configure their own networks. Using ip netns exec is what let us finish up the configuration without having to take the dangerous step of running the container itself with --privileged=true.

最后,注意ip netns exec命令,它可以让我们像root一样配置网络命名空间。但在容器内部无法使用,因为统一的安全策略,docker限制容器进程配置自己的网络。使用ip netns exec 可以让我们不用设置--privileged=true就可以完成一些可能带来危险的操作。

Tools and Examples

工具和示例子

Before diving into the following sections on custom network topologies, you might be interested in glancing at a few external tools or examples of the same kinds of configuration. Here are two:

在介绍自定义网络拓扑之前,你可能会对一些外部工具和例子感兴趣:

  • Jérôme Petazzoni has created a pipework shell script to help you connect together containers in arbitrarily complex scenarios: https://github.com/jpetazzo/pipework Jérôme Petazzoni 创建了一个叫pipework的shell脚本来帮助我们在复杂的场景中完成网络连接

  • Brandon Rhodes has created a whole network topology of Docker containers for the next edition of Foundations of Python Network Programming that includes routing, NAT'd firewalls, and servers that offer HTTP, SMTP, POP, IMAP, Telnet, SSH, and FTP: https://github.com/brandon-rhodes/fopnp/tree/m/playground Brandon Rhodes创建了一个完整的docker容器网络拓扑,包含 nat 防火墙,服务包括HTTP, SMTP, POP, IMAP, Telnet, SSH, and FTP:

Both tools use networking commands very much like the ones you saw in the previous section, and will see in the following sections.

工具使用的网络命令跟我们之前看到非常相似。

Building a point-to-point connection

创建一个点到点连接

By default, Docker attaches all containers to the virtual subnet implemented by docker0. You can create containers that are each connected to some different virtual subnet by creating your own bridge as shown inBuilding your own bridge, starting each container with docker run --net=none, and then attaching the containers to your bridge with the shell commands shown in How Docker networks a container.

默认docker会将所有容器到由docker0提供的虚拟子网。你也可以使用自己创建的网桥见Building your own bridge,使用docker run --net=none, 然后连接到自己的网桥见 How Docker networks a container

But sometimes you want two particular containers to be able to communicate directly without the added complexity of both being bound to a host-wide Ethernet bridge.

但有时候你想要2个特殊的容器可以直连通信,而不用去配置复杂的主机网卡桥接。

The solution is simple: when you create your pair of peer interfaces, simply throw both of them into containers, and configure them as classic point-to-point links. The two containers will then be able to communicate directly (provided you manage to tell each container the other's IP address, of course). You might adjust the instructions of the previous section to go something like this:

解决办法很简单:当你创建一对接口节点,把他们都放进容器中,配置成点到点链路类型。这2个容器就可以直接通信了。配置如下:

# Start up two containers in two terminal windows 在2个终端中启动2个容器

$ sudo docker run -i -t --rm --net=none base /bin/bash
root@1f1f4c1f931a:/#

$ sudo docker run -i -t --rm --net=none base /bin/bash
root@12e343489d2f:/#

# Learn the container process IDs
# and create their namespace entries

$ sudo docker inspect -f '{{.State.Pid}}' 1f1f4c1f931a
2989
$ sudo docker inspect -f '{{.State.Pid}}' 12e343489d2f
3004
$ sudo mkdir -p /var/run/netns
$ sudo ln -s /proc/2989/ns/net /var/run/netns/2989
$ sudo ln -s /proc/3004/ns/net /var/run/netns/3004

# Create the "peer" interfaces and hand them out 创建”peer“接口,然后配置路由

$ sudo ip link add A type veth peer name B

$ sudo ip link set A netns 2989
$ sudo ip netns exec 2989 ip addr add 10.1.1.1/32 dev A
$ sudo ip netns exec 2989 ip link set A up
$ sudo ip netns exec 2989 ip route add 10.1.1.2/32 dev A

$ sudo ip link set B netns 3004
$ sudo ip netns exec 3004 ip addr add 10.1.1.2/32 dev B
$ sudo ip netns exec 3004 ip link set B up
$ sudo ip netns exec 3004 ip route add 10.1.1.1/32 dev B

The two containers should now be able to ping each other and make connections successfully. Point-to-point links like this do not depend on a subnet nor a netmask, but on the bare assertion made by ip route that some other single IP address is connected to a particular network interface.

现在这2个容器就可以相互ping通,并成功建立连接。点到点链路不需要子网和子网掩码,使用ip route 来连接单个ip地址到指定的网络接口。

Note that point-to-point links can be safely combined with other kinds of network connectivity — there is no need to start the containers with --net=none if you want point-to-point links to be an addition to the container's normal networking instead of a replacement.

如果没有特殊需要你不需要指定--net=none来创建点到点链路。

A final permutation of this pattern is to create the point-to-point link between the Docker host and one container, which would allow the host to communicate with that one container on some single IP address and thus communicate “out-of-band” of the bridge that connects the other, more usual containers. But unless you have very specific networking needs that drive you to such a solution, it is probably far preferable to use --icc=false to lock down inter-container communication, as we explored earlier.

最后,就是创建一个容器只跟主机通信,除非有特殊需求,你可以仅用--icc=false来 限制主机 间的通信。

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐