前言

前面介绍了定向调度,虽然很方便,但是也存在一些问题,不够灵活并且是硬限制,如果Node节点不存在,那么该Pod就运行不了,所以使用场景还是有所限制。

针对于上面问题,k8s就给我们提供了亲和性调度,它在NodeSelector上做了一些扩展,可以通过配置优先选择符合条件的Node节点,如果没有也可以调度到其它Node节点上,取代了定向调度的硬限制。随着亲和性调度越来越能够体现NodeSelector的功能,最终NodeSelector应该会被废弃

亲和性调度有两种分别是 节点亲和性 与 Pod亲和性。本文主要介绍节点亲和性,Pod亲和性后面更新

NodeAffinity

NodeAffinity表示Node亲和性调度,用于替换NodeSelector的全新调度策略

目前nodeAffinity有两种配置项

requiredDuringSchedulingIgnoredDuringExecution

必须满足指定的规则才能将Pod调度到节点,属于硬限制,调度完成之后就不再检查条件是否满足,与NodeSelector非常像只是语法不同,所以说NodeAffinity可以替代NodeSelector

通过 kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution 可以查看配置项,具体如下

requiredDuringSchedulingIgnoredDuringExecution
  nodeSelectorTerms  #<[]Object> -required- 节点选择列表
  - matchFields   # <[]Object> 按节点字段列出的节点选择器要求列表  
  - matchExpressions   #<[]Object> 按节点标签列出的节点选择器要求列表(推荐)
    - key    # 标签名
      operator # 操作符 包括 In、NotIn、Exists、DoesNotExist、Gt、Lt
      values # 标签值

操作符代表的含义如下

In # label的值必须在某个节点列表中
NotIn # 与In相反
Exists # 某个label存在
DoesNotExist # 某个label不存在
Gt # label的值大于某个值
Lt # label的值小于某个值

后面讲的Pod亲和性具有互斥功能,Node亲和性虽然语法上没有互斥功能,但是通过 NotIn和DoesNotExist可以实现节点互斥的功能

通过kubectl label命令给node01打上北京机房的标签,给node02打上上海机房的标签

kubectl label nodes node01 area=bj
kubectl label nodes node02 area=shanghai

# 查看标签是否设置成功
kubectl get nodes --show-labels

编写 nginx-nodeAffinity-required.yaml 内容如下

apiVersion: v1
kind: Pod
metadata:
  name: nginx-node-affinity-required
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: area
            operator: In
            values: ['bj','changsha']

启动Pod,由于设置values: [‘bj’,‘changsha’],所以只有node01满足要求,观察Pod所在节点

# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-required.yaml
pod/nginx-node-affinity-required created

# 查看Pod详情,落在node01节点
[root@master affinity]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nginx-node-affinity-required   1/1     Running   0          23s   10.244.1.71   node01   <none>           <none>

修改yaml为values: [‘shanghai’,‘changsha’],此时只有node02满足要求,启动Pod观察Pod所在节点

# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-required.yaml 
pod "nginx-node-affinity-required" deleted

# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-required.yaml
pod/nginx-node-affinity-required created

# 查看Pod详情,落在node02节点
[root@master affinity]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nginx-node-affinity-required   1/1     Running   0          8s    10.244.2.24   node02   <none>           <none>

修改yaml为values: [‘hangzhou’,‘changsha’],此时没有节点满足要求,前面提到requiredDuringSchedulingIgnoredDuringExecution属于硬限制,所以Pod应该运行不起来,启动Pod,观察Pod状态

# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-required.yaml 
pod "nginx-node-affinity-required" deleted

# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-required.yaml
pod/nginx-node-affinity-required created

# 查看Pod详情,状态为Pending并没有运行成功
[root@master affinity]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx-node-affinity-required   0/1     Pending   0          14s   <none>   <none>   <none>           <none>

# 查看启动过程事件,抛出异常
[root@master affinity]# kubectl describe  pod nginx-node-affinity-required|grep -A 100 Event
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  44s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.
preferredDuringSchedulingIgnoredDuringExecution

优先满足指定规则,调度器会尝试调度Pod到指定Node,如果没有此节点,也不会强制要求,是一个软限制。如果有多个匹配规则,可以设置权重(weight)来定义执行的先后顺序。

通过 kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution 可以查看配置项,具体如下

preferredDuringSchedulingIgnoredDuringExecution
- weight # 权重,范围1-100。
  preference # 一个节点选择器项,与相应的权重相关联
    matchFields #  <[]Object> 按节点字段列出的节点选择器要求列表
    matchExpressions  # <[]Object> 按节点标签列出的节点选择器要求列表(推荐)
      key # 标签
      operator # 操作符
      values # <[]string> 标签值

编写 nginx-nodeAffinity-preferred.yaml 内容如下,设置两个匹配规则,以权重定义执行顺序

apiVersion: v1
kind: Pod
metadata:
  name: nginx-node-affinity-preferred
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: area
            operator: In
            values: ['bj']
      - weight: 2
        preference:
          matchExpressions:
          - key: area
            operator: In
            values: ['shanghai']

启动Pod,由于设置有两个匹配规则,并且shanghai的优先级高于bj,所以应该要落在node02节点,观察Pod所在节点

# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-preferred.yaml
pod/nginx-node-affinity-preferred created

# 查看Pod详情,落在node02节点
[root@master affinity]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nginx-node-affinity-preferred   1/1     Running   0          12s   10.244.2.25   node02   <none>           <none>

修改yaml将bj的权重设为3,再次启动Pod,观察Pod所在节点

# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-preferred.yaml
pod "nginx-node-affinity-preferred" deleted

# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-preferred.yaml
pod/nginx-node-affinity-preferred created

# 查看Pod详情,落在node01节点
[root@master affinity]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nginx-node-affinity-preferred   1/1     Running   0          8s    10.244.1.72   node01   <none>           <none>

修改yaml如下,条件均不满足

apiVersion: v1
kind: Pod
metadata:
  name: nginx-node-affinity-preferred
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: area
            operator: In
            values: ['changsha']
      - weight: 2
        preference:
          matchExpressions:
          - key: area
            operator: In
            values: ['hangzhou']

启动Pod,上面提到,这种调度为软限制,如果没有找到符合条件的Node会退而求其次,选择一个资源充足的Pod进行调度

# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-preferred.yaml
pod "nginx-node-affinity-preferred" deleted

# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-preferred.yaml
pod/nginx-node-affinity-preferred created

# 调度到了node01几点
[root@master affinity]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nginx-node-affinity-preferred   1/1     Running   0          4s    10.244.1.74   node01   <none>           <none>
问题

由于nodeSelector还未被废弃,那么同时设置nodeSelector与nodeAffinity,该如何调度

从上面给出的亲和性配置项可以知道nodeSelectorTerms为一个数组类型,可以设置多个matchExpressions,那么有多个matchExpressions该如何匹配

matchExpressions也是一个数组,可以设置多组key-value操作,这又该如何匹配

同时定义nodeSelector与nodeAffinity

编写 affinity-nodeselector.yaml 内容如下

apiVersion: v1
kind: Pod
metadata:
  name: affinity-nodeselector
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: area
            operator: In
            values: ['bj']
  nodeSelector:
    area: shanghai

启动Pod,由于两者都是硬限制,限制的条件不同应该无法调度。

# 启动Pod
[root@master affinity]# kubectl create -f affinity-nodeselector.yaml 
pod/affinity-nodeselector created

# 查看Pod状态,未启动成功
[root@master affinity]# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
affinity-nodeselector   0/1     Pending   0          9s

# 查看启动事件,看到了似曾相识的报错
[root@master affinity]# kubectl describe pod affinity-nodeselector | grep -A 100 Events
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  34s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.

修改nodeSelector,将条件也设置为bj,启动成功

[root@master affinity]# kubectl get pods -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
affinity-nodeselector   1/1     Running   0          7s    10.244.1.75   node01   <none>           <none>
nodeAffinity指定多个matchExpressions如何执行

编写 multiple-nodeselectorterms.yaml 内容如下

apiVersion: v1
kind: Pod
metadata:
  name: multiple-nodeselectorterms
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: area
            operator: In
            values: ['changsha']
        - matchExpressions:
          - key: area
            operator: In
            values: ['shanghai']

启动Pod,如果可以启动成功并且在node02节点,表示只要满足一个条件即可,如果启动不成功表示都要满足

# 启动
[root@master affinity]# kubectl create -f multiple-nodeselectorterms.yaml                
pod/multiple-nodeselectorterms created

# 观察Pod详情,启动成功,并且Pod落在node02节点
[root@master affinity]# kubectl get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
multiple-nodeselectorterms   1/1     Running   0          19s   10.244.2.26   node02   <none>           <none>
matchExpressions有多个匹配项

编写 multiple-matchexpressions.yaml 内容如下

apiVersion: v1
kind: Pod
metadata:
  name: multiple-matchexpressions
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: area
            operator: In
            values: ['changsha']
          - key: area
            operator: In
            values: ['shanghai']

启动Pod,如果可以启动成功并且在node02节点,表示只要满足一个条件即可,如果启动不成功表示都要满足

# 启动
[root@master affinity]# kubectl create -f multiple-matchexpressions.yaml
pod/multiple-matchexpressions created

# 观察Pod,启动失败
[root@master affinity]# kubectl get pods -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
multiple-matchexpressions   0/1     Pending   0          11s   <none>   <none>   <none>           <none>

# 查看启动事件,还是熟悉的报错
[root@master affinity]# kubectl describe pod multiple-matchexpressions | grep -A 100 Events                     
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  48s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.
总结

如果同时定义了nodeSelector与nodeAffinity,两个条件必须满足,才能调度到指定Pod上

如果有多个 matchExpressions ,那么只要满足一个即可

如果一个matchExpressions 下有多个匹配项,那么需要全部满足

节点亲和性就介绍到这里,后面介绍Pod亲和性调度与互斥调度。


欢迎关注,学习不迷路!

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐