前言

前面我们讲了Node的亲和性调度,但那只是对于Pod与Node之间关系能够更加容易的表达,但是实际的生产环境中对于Pod的调度还有一些特殊的需求,比如Pod之间存在相互依赖关系,调用频繁,对于这一类的Pod我们希望它们尽量部署在同一个机房,甚至同一个节点上,相反,两个毫无关系的Pod并且有可能存在一些竞争,会影响到该节点上其它的Pod,我们希望这些Pod尽量远离,所以K8s 1.4之后就引入了Pod亲和性与反亲和性调度。

亲和性

如果两个应用交互频繁,那么就有必要让两个应用尽量的靠近,这样可以减少网络通信带来的性能损耗

亲和性主要由三组条件决定

一是命名空间namespace

二是拓扑域 topology,拓扑域可以理解为是一组Node的集群,这些Node通常是有相同的地理空间坐标,如同机架、机房或区域等,在一些极端情况下一个Node也可以是一个拓扑域,K8s也给我们内置了一些拓扑域

  • kubernetes.io/hostname
  • topology.kubernetes.io/zone
  • topology.kubernetes.io/region

region一般表示机架,机房等,zone的跨度更大,一般表示地域。kubernetes.io/hostname被设置为Node节点上的hostname,其它两个则是由公有云厂商提供。

三是目标Pod的标签label,通过寻找带有label标签的Pod所在的节点进行调度(通常我们的场景就使用它来完成)

可以通过describe查看Node自带标签

[root@master ~]# kubectl describe node node01
Name:               node01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node01
                    kubernetes.io/os=linux

与Node亲和性一样,也是有着硬限制和软限制之分

通过 kubectl explain pods.spec.affinity.podAffinity 命令可以查看所有的配置项

配置项如下

requiredDuringSchedulingIgnoredDuringExecution  # 硬限制
- namespaces # 指定参照pod的namespace
  topologyKey # 拓扑域,必须填写
  labelSelector # 标签选择器
    matchExpressions  # 按节点标签列出的节点选择器要求列表(推荐)
    - key    # 标签
      operator # 操作符 In, NotIn, Exists, DoesNotExist
      values # 标签值
    matchLabels    # {key,value} 是一个map,相当于matchExpressions的in操作
  namespaceSelector # 还只是测试版本,暂时不介绍

preferredDuringSchedulingIgnoredDuringExecution # 软限制    
- podAffinityTerm  # 选项
    namespaces # 指定参照pod的namespace
    topologyKey # 拓扑域,必须填写
    labelSelector
      matchExpressions 
      - key   # 标签
        operator # 操作符 In, NotIn, Exists, DoesNotExist
        values # 标签值
      matchLabels  # {key,value} 是一个map,相当于matchExpressions的in操作
  weight # 权重 范围1-100

硬限制

Pod亲和性需要有一个已经运行Pod作为参照,从而实现新的Pod与参照Pod在同一区域的功能

编写参照Pod podaffinity-target.yaml ,该yaml下有两个Pod,分别在node01与node02节点,标签分别是pro与test

apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-target1
  labels:
    env: pro
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  nodeName: node01 # 将target放到node01
---
apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-target2
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  nodeName: node02 # 将target放到node02

编写 podaffinity-required.yaml 内容如下,亲和性选择标签带有env=pro的Pod

apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-required
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  affinity:
    podAffinity: # 使用Pod亲和性
      requiredDuringSchedulingIgnoredDuringExecution:  # 硬限制
      - labelSelector:
          matchExpressions:
          - key: env
            operator: In
            values: ["pro"]
        topologyKey: kubernetes.io/hostname

在不启动podaffinity-target的情况下直接启动 podaffinity-required,观察Pod情况

# 启动
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml 
pod/podaffinity-required created
# 观察Pod详情,启动失败
[root@master pod-affinity]# kubectl get pod -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
podaffinity-required   0/1     Pending   0          7s    <none>   <none>   <none>           <none>

# 还是熟悉的错误,一个污点,两个标签不匹配问题
[root@master pod-affinity]# kubectl describe pod podaffinity-required |grep -A 100 Event
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  20s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.

启动podaffinity-target,再启动podaffinity-required,此时node01节点有标签为env=pro的Pod,所以podaffinity-required应该调度到node01

# 启动podaffinity-target
[root@master pod-affinity]# kubectl create -f podaffinity-target.yaml 
pod/podaffinity-target1 created
pod/podaffinity-target2 created
# 启动podaffinity-required
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml 
pod/podaffinity-required created

# 观察Pod详情,已经调度到node01
[root@master pod-affinity]# kubectl get pod -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-required   1/1     Running   0          5s    10.244.1.82   node01   <none>           <none>
podaffinity-target1    1/1     Running   0          12s   10.244.1.81   node01   <none>           <none>
podaffinity-target2    1/1     Running   0          12s   10.244.2.28   node02   <none>           <none>

修改 podaffinity-required.yaml 将 values: [“pro”]改为 values: [“test”],重新启动podaffinity-required,此时node02节点有标签为env=test的Pod,所以podaffinity-required应该调度到node02

# 删除之前的Pod
[root@master pod-affinity]# kubectl delete -f podaffinity-required.yaml 
pod "podaffinity-required" deleted
# 修改yaml后重新启动
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml 
pod/podaffinity-required created

# 观察Pod详情,已经调度到node02
[root@master pod-affinity]# kubectl get pod -o wide
NAME                   READY   STATUS    RESTARTS   AGE    IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-required   1/1     Running   0          4s     10.244.2.29   node02   <none>           <none>
podaffinity-target1    1/1     Running   0          2m6s   10.244.1.81   node01   <none>           <none>
podaffinity-target2    1/1     Running   0          2m6s   10.244.2.28   node02   <none>           <none>

修改 podaffinity-required.yaml 将 values: [“pro”]改为 values: [“dev”],匹配不到任何Pod,应该报错

# 删除之前的Pod
[root@master pod-affinity]# kubectl delete -f podaffinity-required.yaml 
pod "podaffinity-required" deleted
# 修改yaml后重新启动
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml 
pod/podaffinity-required created

# 观察Pod详情,启动失败
[root@master pod-affinity]# kubectl get pod -o wide
NAME                   READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-required   0/1     Pending   0          17s     <none>        <none>   <none>           <none>
podaffinity-target1    1/1     Running   0          4m26s   10.244.1.81   node01   <none>           <none>
podaffinity-target2    1/1     Running   0          4m26s   10.244.2.28   node02   <none>           <none>

# 依然是这个错误
[root@master pod-affinity]# kubectl describe pod podaffinity-required |grep -A 100 Event
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  28s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.
软限制

从上面的结果可以看出,硬限制如果匹配不到,那么Pod就运行不起来,软限制则相反,匹配不到那么就退而求其次,找一个资源满足的Pod调度就行

编写 podaffinity-preferred.yaml 内容如下,标签匹配为env=dev,此时寻找不到可以匹配的Pod

apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-preferred
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  affinity:
    podAffinity: # 使用Pod亲和性
      preferredDuringSchedulingIgnoredDuringExecution:  # 软限制
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: env
              operator: In
              values: ["dev"]
          topologyKey: kubernetes.io/hostname
        weight: 1

启动podaffinity-preferred,观察Pod是否可以正常运行

# 启动podaffinity-preferred
[root@master pod-affinity]# kubectl create -f podaffinity-preferred.yaml 
pod/podaffinity-preferred created

# 观察Pod详情,podaffinity-preferred可以正常运行
[root@master pod-affinity]# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-preferred   1/1     Running   0          12s   10.244.2.31   node02   <none>           <none>
podaffinity-target1     1/1     Running   0          19s   10.244.1.84   node01   <none>           <none>
podaffinity-target2     1/1     Running   0          19s   10.244.2.30   node02   <none>           <none>

修改podaffinity-preferred.yaml,设置两个亲和性规则,分别设置不同权重

apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-preferred
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  affinity:
    podAffinity: # 使用Pod亲和性
      preferredDuringSchedulingIgnoredDuringExecution:  # 软限制
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: env
              operator: In
              values: ["pro"]
          topologyKey: kubernetes.io/hostname
        weight: 1
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: env
              operator: In
              values: ["test"]
          topologyKey: kubernetes.io/hostname
        weight: 2

修改完yaml后重新启动podaffinity-preferred,由于env=test的权重比较大,应该匹配到node02节点

# 启动podaffinity-preferred
[root@master pod-affinity]# kubectl create -f podaffinity-preferred.yaml 
pod/podaffinity-preferred created

# 观察Pod详情,已经调度到node02
[root@master pod-affinity]# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-preferred   1/1     Running   0          4s      10.244.2.32   node02   <none>           <none>
podaffinity-target1     1/1     Running   0          2m10s   10.244.1.84   node01   <none>           <none>
podaffinity-target2     1/1     Running   0          2m10s   10.244.2.30   node02   <none>           <none>

修改podaffinity-preferred.yaml,将env=pro的权重设置为3,重新启动podaffinity-preferred,此时应该调度到node01节点

[root@master pod-affinity]# kubectl delete -f podaffinity-preferred.yaml                
pod "podaffinity-preferred" deleted
[root@master pod-affinity]# kubectl create -f podaffinity-preferred.yaml 
pod/podaffinity-preferred created

[root@master pod-affinity]# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-preferred   1/1     Running   0          5s      10.244.1.85   node01   <none>           <none>
podaffinity-target1     1/1     Running   0          2m55s   10.244.1.84   node01   <none>           <none>
podaffinity-target2     1/1     Running   0          2m55s   10.244.2.30   node02   <none>           <none>

相互性

其实亲和性也是相互的,下面验证一下

在编写一个podaffinity-target345.yaml ,同样带有标签env=pro

apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-target3
  labels:
    env: pro
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
---
apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-target4
  labels:
    env: pro
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
---
apiVersion: v1
kind: Pod
metadata:
  name: podaffinity-target5
  labels:
    env: pro
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像

启动 podaffinity-target345,如果亲和性具有相互性,应该也会调度到node01

# 启动podaffinity-target345
[root@master pod-affinity]# kubectl create -f podaffinity-target345.yaml
pod/podaffinity-target3 created
pod/podaffinity-target4 created
pod/podaffinity-target5 created

# 启动成功,均调度到了node01节点
[root@master pod-affinity]# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-preferred   1/1     Running   0          31m   10.244.1.85   node01   <none>           <none>
podaffinity-target1     1/1     Running   0          34m   10.244.1.84   node01   <none>           <none>
podaffinity-target2     1/1     Running   0          34m   10.244.2.30   node02   <none>           <none>
podaffinity-target3     1/1     Running   0          10s   10.244.1.87   node01   <none>           <none>
podaffinity-target4     1/1     Running   0          10s   10.244.1.88   node01   <none>           <none>
podaffinity-target5     1/1     Running   0          10s   10.244.1.89   node01   <none>           <none>
反亲和性

反亲和性的应用场景也挺多,如多副本部署应用的时候,我们希望应用可以打散分布在各个Node节点,这样可以提高服务可用性

反亲和性的配置项与亲和性一样,只需要将 podAffinity 改为 podAntiAffinity就可以,同样也是会有硬限制和软限制

删除上面测试创建的Pod,保留最开始的两个target,node01与node02各一个

[root@master pod-affinity]# kubectl delete -f podaffinity-target345.yaml
pod "podaffinity-target3" deleted
pod "podaffinity-target4" deleted
pod "podaffinity-target5" deleted
[root@master pod-affinity]# kubectl delete -f podaffinity-preferred.yaml 
pod "podaffinity-preferred" deleted

# 保留这两个target
[root@master pod-affinity]# kubectl get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-target1   1/1     Running   0          42m   10.244.1.84   node01   <none>           <none>
podaffinity-target2   1/1     Running   0          42m   10.244.2.30   node02   <none>           <none>
硬限制

编写 podantiaffinity-required.yaml 内容如下,反亲和性则相反,下面yaml表示匹配到env=pro,test的节点就不能调度,所以node01与node02均不能调度,又因为是硬限制,所以Pod应该无法正常运行

apiVersion: v1
kind: Pod
metadata:
  name: podantiaffinity-required
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  affinity:
    podAntiAffinity: # 使用Pod反亲和性
      requiredDuringSchedulingIgnoredDuringExecution:  # 硬限制
      - labelSelector:
          matchExpressions:
          - key: env
            operator: In
            values: ["pro","test"]
        topologyKey: kubernetes.io/hostname

启动 podantiaffinity-required ,观察Pod是否正常运行

# 启动 podantiaffinity-required 
[root@master pod-affinity]# kubectl create -f podantiaffinity-required.yaml
pod/podantiaffinity-required created

# Pod状态为Pending,启动失败
[root@master pod-affinity]# kubectl get pod -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-target1        1/1     Running   0          53m   10.244.1.84   node01   <none>           <none>
podaffinity-target2        1/1     Running   0          53m   10.244.2.30   node02   <none>           <none>
podantiaffinity-required   0/1     Pending   0          6s    <none>        <none>   <none>           <none>

# 熟悉的错误
[root@master pod-affinity]# kubectl describe pod podantiaffinity-required |grep -A 100 Event
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  40s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod anti-affinity rules.

将podantiaffinity-required.yaml的 values: [“pro”,“test”] 改为 values: [“pro”],代表不能调度到有标签为env=pro的Pod所在的节点,也就是node01节点,所以只能调度到node02节点,启动podantiaffinity-required

# 删除之前的podantiaffinity-required
[root@master pod-affinity]# kubectl delete -f podantiaffinity-required.yaml       
pod "podantiaffinity-required" deleted

# 启动podantiaffinity-required
[root@master pod-affinity]# kubectl create -f podantiaffinity-required.yaml
pod/podantiaffinity-required created

# Pod正常运行,并且调度到了node02节点
[root@master pod-affinity]# kubectl get pod -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-target1        1/1     Running   0          54m   10.244.1.84   node01   <none>           <none>
podaffinity-target2        1/1     Running   0          54m   10.244.2.30   node02   <none>           <none>
podantiaffinity-required   1/1     Running   0          4s    10.244.2.33   node02   <none>           <none>
软限制

一样的,如果匹配不到指定Node,那么选择一个资源充足的Node即可

编写 podantiaffinity-preferred.yaml 内容如下,表示不能调度到 node01与node02节点,但是没有其他节点了,由于是软限制,那么最终还是会在两个节点之间选择一个调度

apiVersion: v1
kind: Pod
metadata:
  name: podantiaffinity-preferred
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent # 本地有不拉取镜像
  affinity:
    podAntiAffinity: # 使用Pod亲和性
      preferredDuringSchedulingIgnoredDuringExecution:  # 软限制
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: env
              operator: In
              values: ["pro","test"]
          topologyKey: kubernetes.io/hostname
        weight: 1

启动 podantiaffinity-preferred ,观察Pod是否正常运行

# 启动 podantiaffinity-preferred
[root@master pod-affinity]# kubectl create -f podantiaffinity-preferred.yaml 
pod/podantiaffinity-preferred created

# Pod正常运行
[root@master pod-affinity]# kubectl get pod -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
podaffinity-target1         1/1     Running   0          58m   10.244.1.84   node01   <none>           <none>
podaffinity-target2         1/1     Running   0          58m   10.244.2.30   node02   <none>           <none>
podantiaffinity-preferred   1/1     Running   0          7s    10.244.1.90   node01   <none>           <none>
注意
  1. 亲和性调度不允许使用空的 topologyKey
  2. 反亲和性硬限制不允许使用空的 topologyKey
  3. 反亲和性软限制空的topologyKey默认使用这三种标签组合 kubernetes.io/hostname、failure-domain.beta.kubernetes.io/zone、failure-domain.beta.kubernetes.io/region
  4. 如果admission controller 设置了LimitPodHardAntiAffinityTopology, 则互斥性被限制在 kubernetes.io/hostname ,要使用自定义的topologyKey需要改写或者禁用该控制器

如果没有上述情况,就可以使用任意合法的key

Pod的亲和性与互斥性调度就介绍到这里了,后面我们介绍污点与容忍


欢迎关注,学习不迷路!

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐