3.10 node的亲和性和pod的亲和性/反亲和性
文章目录一、通过标签选择器调度pod到指定节点二、通过node节点亲和性调度pod1、requiredDuringSchedulingIgnoredDuringExecution 用法2、preferredDuringSchedulingIgnoredDuringExecution 用法当集群的每个节点的配置不一样时,需要把特定功能的pod调度到指定的节点上,可以通过以下几种方式指定pod的调度方
当集群的每个节点的配置不一样时,需要把特定功能的pod调度到指定的节点上,可以通过以下几种方式指定pod的调度方式。比如一个集群中有一个master节点,两个node节点,分别为k8s-node01和k8s-node,为k8s-node01节点加标签prod=dev,为k8s-node02节点加标签prod=sit
kubectl label node k8s-node01 prod=dev
kubectl label node k8s-node02 prod=sit
一、pod调度在固定节点
1、通过标签选择器调度pod到指定节点
在定义pod资源文件清单时,可以指定pod被调度到哪一个节点上,如下pod1.yaml所示
apiVersion: v1
kind: Pod
metadata:
name: node-selector-pod
spec:
nodeSelector:
prod: sit #调度pod时选择标签为prod=dev的node节点
containers:
- image: busybox
name: busybox
command: ["sh", "-c", "sleep 3600"]
imagePullPolicy: IfNotPresent
通过该yaml文件创建pod后,查看pod调度节点为k8s-node02(标签为prod=sit的节点),删除该pod,然后重新创建pod,依然调度到k8s-node02节点,如下所示,如果创建pod是指定的节点标签不存在,那么pod会一直pending状态,知道有符合标签的节点存在后变成running状态。
[root@k8s-master01 affinity_work]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-selector-pod 1/1 Running 0 10s 10.244.2.164 k8s-node02 <none> <none>
2、通过节点名字固定调度到指定节点
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: my-replica
spec:
replicas: 3
template:
metadata:
labels:
app: busybox
spec:
nodeName: k8s-node01 #3个副本pod全部会被调度在k8s-node01上面
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "sleep 3600"]
imagePullPolicy: IfNotPresent
上述方式通过nodeName: k8s-node01
指定具体节点后,pod副本数全部会调度在k8s-node01上面。
二、通过node节点亲和性调度pod
起初,只允许在pod里面指定标签选择器来实现pod调度到哪一个节点上,随着pod调度发展,有新的机制可以指定pod的调度节点,例如本节将要说的node节点的亲和性nodeAffinity,就是致命pod更偏向于部署在什么样的节点上。节点亲和性包含2种方式:
- requiredDuringSchedulingIgnoredDuringExecution 强制亲和性,表示一定要把pod部署在指定的标签节点上,如果指定的标签节点不存在,pod就一直显示pending状态;
- preferredDuringSchedulingIgnoredDuringExecution 软亲和性,表示pod更倾向于部署在指定的标签节点上,如果指定的标签节点不存在,则pod就部署在其他节点上。如果要把大量pod部署在指定的标签节点上时,也会有少量的pod调度在其他节点上,否则所有pod调度在一个节点上,一旦节点异常,影响所有的pod。所以软亲和性指pod更倾向于调度在指定的节点上,而不是必须。
1、requiredDuringSchedulingIgnoredDuringExecution 用法
以下面pod2.yaml为例,创建pod,并用requiredDuringSchedulingIgnoredDuringExecution指明pod要调度在prod=dev的节点上
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
spec:
containers:
- name: busybox
image: busybox
imagePullPolicy: IfNotPresent
command: ["sh", "-c", "sleep 3600"]
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: prod #指定affinity-pod要调度在含有标签prod=dev的节点上
operator: In
values:
- dev
创建affinity-pod后,查看affinity-pod调度的节点,发现调度在了k8s-node01(prod=dev标签所在的节点)节点上。即使删除掉affinity-pod后,重新创建该pod,发现依然调度在k8s-node01上。
[root@k8s-master01 affinity_work]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
affinity-pod 1/1 Running 0 6s 10.244.1.190 k8s-node01 <none> <none>
2、preferredDuringSchedulingIgnoredDuringExecution 用法
以pod3.yaml为例,创建ReplicaSet,并用preferredDuringSchedulingIgnoredDuringExecution指定pod调度到标签不包含prod=dev的节点上。
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: node-affinity-replicaset
spec:
replicas: 3
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busbox
image: busybox
command: ["sh", "-c", "sleep 3600"]
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: prod
operator: NotIn
values:
- dev
通过该replicaset资源清单创建后,发现pod更倾向于调度到k8s-node02节点上,虽然用preferredDuringSchedulingIgnoredDuringExecution指定了pod更倾向于调度在标签不为prod=dev的节点上,但也有少量的(本案例只有1个)pod调度在了标签为prod=dev的节点k8s-node01上。
[root@k8s-master01 affinity_work]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity-replicaset-bwm49 1/1 Running 0 12s 10.244.2.165 k8s-node02 <none> <none>
node-affinity-replicaset-k96w6 1/1 Running 0 12s 10.244.2.166 k8s-node02 <none> <none>
node-affinity-replicaset-www2m 1/1 Running 0 12s 10.244.1.191 k8s-node01 <none> <none>
用preferredDuringSchedulingIgnoredDuringExecution指定pod的优先亲和性调度后,会根据设置的优先级权重调度,权重越高的越容易调度,比如上述案例权重weight: 80,具有最优先权。还可以设置多个权重,按权重级别高低进行调度,如下所示,设置了两个权重,pod调度时,会根据权重的高低进行倾向性的亲和性调度。
.....省略.....
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80 #权重高
preference:
matchExpressions:
- key: prod
operator: NotIn
values:
- dev
- weight: 20 #权重低
preference:
matchExpressions:
- key: prod
operator: In
values:
- uat
注意:pod在优先级高的节点上调度了,但也在优先级低的节点上调度了。原因是除了节点亲和性优先级函数,调度器还使用了其它的优先级函数来决定节点被调度的节点。比如其中之一就是Selector SpreadPriority函数,这个函数确保了属于同一个ReplicaSet或者Service的pod分散部署在不同的节点上,以避免单个节点失效导致这个服务宕机。
三、通过Pod的亲和性调度pod
上述介绍的node节点的亲和性,是根据节点的亲近关系进行调度pod的,本节介绍的Pod的亲和性是根据Pod的亲近关系进行调度的。Pod的亲和性与上述node的亲和性相似,分为requiredDuringSchedulingIgnoredDuringExecution(强制亲和性)和preferredDuringSchedulingIgnoredDuringExecution(优先级亲和性)。在介绍Pod亲和性之前先介绍下拓扑域(topologyKey)。
1、拓扑域(topologyKey)
拓扑域是指一个范围的概念,可以是一个node、一个机柜、一个机房、或者一个地区等。实际上是根据node上标签进行划分范围的,比如有3个node的标签同为prod=dev,可以认为是3个node为一个拓扑域,或者第一个机柜上的节点标签全部为pc=pc1,第二个机柜上节点标签全部为pc=pc2,则认为第一个机柜是一个拓扑域,第二个机柜是另一个拓扑域,所以是拓扑域是根据节点的标签进行划分的。如下所示,Node1、Node2、Node3属于一个拓扑域,Node4、Node5、Node6属于另一个拓扑域。
拓扑域是根据标签进行划分的。如下所示
2、requiredDuringSchedulingIgnoredDuringExecution用法
requiredDuringSchedulingIgnoredDuringExecution用法在pod亲和性和node亲和性中用法一致,requiredDuringSchedulingIgnoredDuringExecution用在pod亲和性中表示,pod在调度时一定要满足与已部署在node节点上pod之间的关系。如下所示
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: affinity-replicaset
spec:
replicas: 3
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "sleep 3600"]
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution: #强制亲和性
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- busybox1
topologyKey: prod
部署affinity-replicaset时会创建3个busybox的pod,并且这3个pod要满足强制亲和性,要调度在节点含有标签为app=busybox1的pod,并且,调度的节点要含有标签prod。如下所示,app=busybox1的pod在kus-node02节点上,所以创建的3个pod全部调度在了k8s-node02节点上。
[root@k8s-master01 affinity_work]# kubectl get pod -o wide --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
affinity-replicaset-4tdfn 1/1 Running 0 19s 10.244.2.168 k8s-node02 <none> <none> app=busybox
affinity-replicaset-lwx88 1/1 Running 0 19s 10.244.2.169 k8s-node02 <none> <none> app=busybox
affinity-replicaset-vldgc 1/1 Running 0 19s 10.244.2.170 k8s-node02 <none> <none> app=busybox
busybox-pod 1/1 Running 0 2m2s 10.244.2.167 k8s-node02 <none> <none> app=busybox1
3、preferredDuringSchedulingIgnoredDuringExecution用法
preferredDuringSchedulingIgnoredDuringExecution在pod倾向性亲和性用法与node中用法一致,表示更倾向于部署在指定节点上,但不是必须的。修改上述yaml如下
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: affinity-replicaset
spec:
replicas: 3
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "sleep 3600"]
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution: #倾向性亲和性
- weight: 20
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- busybox1
topologyKey: prod
通过affinity-replicaset创建的3个pod部署时要满足倾向性亲和性,创建的3个pod更倾向于部署在Pod的标签为app=busybox1所在的节点上,并且所在的节点要含有prod标签。如下所示,标签为app=busybox1的pod部署在了k8s-node02节点上,新创建的3个pod调度时要满足倾向性亲和性,最终2个pod调度在了k8s-node02上,1个pod调度在了k8s-node01上,更倾向于调度在k8s-node02上。
[root@k8s-master01 affinity_work]# kubectl get pod -o wide --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
affinity-replicaset-78jtc 1/1 Running 0 24s 10.244.1.192 k8s-node01 <none> <none> app=busybox
affinity-replicaset-flpwh 1/1 Running 0 24s 10.244.2.172 k8s-node02 <none> <none> app=busybox
affinity-replicaset-tlr2r 1/1 Running 0 24s 10.244.2.173 k8s-node02 <none> <none> app=busybox
busybox-pod 1/1 Running 0 99s 10.244.2.171 k8s-node02 <none> <none> app=busybox1
四、通过Pod的非亲和性调度pod
pod的非亲和性用法与非亲和性用法一致,只需要把yaml中的podAffinity修改成podAntiAffinity即可。如下所示
......省略
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- busybox1
topologyKey: prod
........省略
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- busybox1
topologyKey: prod
pod反亲和性用于requiredDuringSchedulingIgnoredDuringExecution表示,pod部署一定要与requiredDuringSchedulingIgnoredDuringExecution指定的要求相反;
Pod反亲和性用于preferredDuringSchedulingIgnoredDuringExecution表示,pod部署更倾向于与preferredDuringSchedulingIgnoredDuringExecution指定的要求相反,但也有少量一致。
五、亲和性和反亲和性调度策略比较
更多推荐
所有评论(0)