k8s 资源调度(nodeSelector、nodeAffinity、taint&tolrations、nodeName)
- Kubernetes基于list-watch机制的控制器架构,实现组件间交互的解耦。- 其他组件监控自己负责的资源,当这些资源发生变化时,kube-apiserver会通知这些组件,这个过程类似于发布与订阅。
·
资源调度&标签&污点
创建一个Pod的工作流程
- Kubernetes基于list-watch机制的控制器架构,实现组件间交互的解耦。
- 其他组件监控自己负责的资源,当这些资源发生变化时,kube-apiserver会通知这些组件,这个过程类似于发布与订阅。
//帮助
[root@master ~]# kubectl explain deploy.spec.template.spec.containers.resources
KIND: Deployment
VERSION: apps/v1
RESOURCE: resources <Object>
DESCRIPTION:
Compute Resources required by this container. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
ResourceRequirements describes the compute resource requirements.
FIELDS:
limits <map[string]string>
Limits describes the maximum amount of compute resources allowed. More
info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests <map[string]string>
Requests describes the minimum amount of compute resources required. If
Requests is omitted for a container, it defaults to Limits if that is
explicitly specified, otherwise to an implementation-defined value. More
info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
Pod中影响调度的主要属性
资源限制对Pod调度的影响
容器资源限制:
- resources.limits.cpu
- resources.limits.memory
容器使用的最小资源需求,作为容器调度时资源分配的依据:
- resources.requests.cpu
- resources.requests.memory
CPU单位:可以写m也可以写浮点数。例如0.5=500m, 1=1000m
示例
//K8s会根据Request的值去查找有足够资源的Node来调度此Pod
[root@master ~]# cat tt.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
[root@master ~]# kubectl apply -f tt.yml
pod/nginx created
[root@master ~]# kubectl describe node node1
Name: node1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"fe:b4:c1:77:05:a5"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.129.135
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 18 Dec 2021 16:30:11 +0800
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime: <unset>
RenewTime: Thu, 23 Dec 2021 20:35:59 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 23 Dec 2021 19:56:36 +0800 Thu, 23 Dec 2021 19:56:36 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 23 Dec 2021 20:31:46 +0800 Wed, 22 Dec 2021 04:32:02 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.129.135
Hostname: node1
Capacity:
cpu: 2
ephemeral-storage: 36731368Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3842264Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 33851628693
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3739864Ki
pods: 110
System Info:
Machine ID: d2c10a72b80c45679e2c249297ecb522
System UUID: 5b114d56-95d2-7774-4a94-988a30aa87a6
Boot ID: 4ce4678f-9d93-4c07-8f8f-de94070d807f
Kernel Version: 4.18.0-193.el8.x86_64
OS Image: Red Hat Enterprise Linux 8.2 (Ootpa)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.12
Kubelet Version: v1.20.0
Kube-Proxy Version: v1.20.0
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default web 250m (12%) 500m (25%) 64Mi (1%) 128Mi (3%) 5s
kube-system kube-flannel-ds-c9z87 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 5d3h
kube-system kube-proxy-9z78l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d4h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 350m (17%) 600m (30%)
memory 114Mi (3%) 178Mi (4%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 39m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 39m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 39m (x7 over 39m) kubelet Node node1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 39m (x7 over 39m) kubelet Node node1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 39m (x7 over 39m) kubelet Node node1 status is now: NodeHasSufficientPID
Warning Rebooted 39m kubelet Node node1 has been rebooted, boot id: 4ce4678f-9d93-4c07-8f8f-de94070d807f
Normal Starting 39m kube-proxy Starting kube-proxy.
nodeSelector & nodeAffinity
nodeSelector:
用于将Pod调度到匹配Label的Node上,如果没有匹配的标签会调度失败。
作用:
- 约束Pod到特定的节点运行·完全匹配节点标签
应用场景:
- 专用节点:根据业务线将Node分组管理
- 配备特殊硬件:部分Node配有SSD硬盘、GPU
示例:确保Pod分配到具有SSD硬盘的节点上
格式: kubectl label nodes <node-name> <label-key>=<label-value>
例如: kubectl label nodes node2 app=nginx
验证: kubectl get nodes node2 --show-labels
删除: kubectl label nodes node2 app-
验证: kubectl get pod -o wide
示例
调度成功案例
[root@master ~]# kubectl label nodes node2 app=nginx
node/node2 labeled
[root@master ~]# kubectl get nodes node2 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node2 Ready <none> 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master ~]# cat jj.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
app: nginx
[root@master ~]# kubectl apply -f jj.yml
pod/nginx created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 13s 10.244.2.48 node2 <none> <none>
调度失败案例
//取消标签
[root@master ~]# kubectl label nodes node2 app-
node/node2 labeled
//验证
[root@master ~]# kubectl get nodes node2 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node2 Ready <none> 5d4h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master ~]# cat jj.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
app: nginx
[root@master ~]# kubectl delete -f jj.yml
pod "nginx" deleted
[root@master ~]# kubectl apply -f jj.yml
pod/nginx created
//这种情况属于等待(也就是说等待某个节点中有app=nginx,一直等,等到那个节点有就给哪个节点)
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 0/1 Pending 0 14s <none> <none> <none> <none>
//我现在给node2上加标签
[root@master ~]# kubectl get nodes node2 --show-labels #确定node2上没有标签
NAME STATUS ROLES AGE VERSION LABELS
node2 Ready <none> 5d4h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master ~]# kubectl label nodes node2 app=nginx
node/node2 labeled
//刚刚添加标签
[root@master ~]# kubectl get nodes node2 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node2 Ready <none> 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
//发现是node2
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 5m 10.244.2.49 node2 <none> <none>
nodeAffinity:
节点亲和性,与nodeSelector作用一样。但相比更灵活,满足更多条件
- 匹配有更多的逻辑组合,不只是字符串的完全相等
- 调度分为软策略和硬策略,而不是硬性要求
- 硬(required):
必须满足
- 软(preferred):
尝试满足,但不保证
- 硬(required):
- 操作符:ln、NotIn、Exists、DoesNotExist、Gt.Lt
//帮助
[root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity
KIND: Pod
VERSION: v1
RESOURCE: nodeAffinity <Object>
DESCRIPTION:
Describes node affinity scheduling rules for the pod.
Node affinity is a group of node affinity scheduling rules.
FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.
requiredDuringSchedulingIgnoredDuringExecution <Object>
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.
示例
第一种(只会在node1)
node1 打两个标签(app=nginx gpu=nvdia)
node2 打一个标签(app=nginx)
- required:
必须满足
- preferred:
尝试满足,但不保证
//node1 打两个标签(app=nginx gpu=nvdia)
[root@master ~]# kubectl label nodes node1 app=nginx gpu=nvdia
node/node1 labeled
//node2 打一个标签(app=nginx)
[root@master ~]# kubectl label nodes node2 app=nginx
node/node2 labeled
[root@master ~]# kubectl get nodes node1 node2 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1 Ready <none> 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gpu=nvdia,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
node2 Ready <none> 5d4h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master ~]# cat yy.yml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: default
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- nginx
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 3
preference:
matchExpressions:
- key: gpu
operator: In
values:
- nvdia
[root@master ~]# kubectl apply -f yy.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 1 87s 10.244.1.97 node1 <none> <none>
第二种(遵循默认规则,公平竞争)
node1 打一个标签(app=nginx )
node2 打一个标签(app=nginx)
- required:
必须满足
- preferred:
尝试满足,但不保证
//node1 打一个标签(app=nginx)
[root@master ~]# kubectl label nodes node1 app=nginx
node/node1 labeled
//node2 打一个标签(app=nginx)
[root@master ~]# kubectl label nodes node2 app=nginx
node/node2 labeled
[root@master ~]# kubectl get nodes node1 node2 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1 Ready <none> 5d5h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
node2 Ready <none> 5d5h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master ~]# cat yy.yml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: default
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- nginx
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 3
preference:
matchExpressions:
- key: gpu
operator: In
values:
- nvdia
[root@master ~]# kubectl apply -f yy.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 6s 10.244.1.98 node1 <none> <none>
Taint(污点)& Tolerations(污点容忍)
Taints: 避免Pod调度到特定Node上
TolerationsI: 允许Pod调度到持有Taints的Node上
应用场景:
- 专用节点:根据业务线将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
- 配备特殊硬件:部分Node配有SSD硬盘、GPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
- 基于Taint的驱逐
给节点添加污点
格式:kubectl taint node [node] key=value:[effect]
例如:kubectl taint node node1 gpu=yes:NoSchedule
验证:kubectl describe node node1 lgrep Taint
去掉污点:kubectl taint node [node] key:[effect]-
//查看污点
[root@master ~]# kubectl describe node node1 node2 master | grep -i taint
Taints: <none>
Taints: <none>
Taints: node-role.kubernetes.io/master:NoSchedule
其中[effect]
可取值
- NoSchedule :一定不能被调度
- PreferNoSchedule:尽量不要调度,非必须配置容忍
- NoExecute:不仅不会调度,还会驱逐Node上已有的Pod
添加污点容忍(tolrations)字段到Pod配置中
//案例
apiVersion: v1
kind: Pod
metadata:
name: pod-taints
spec:
containers:
- name: pod-taints
image: busybox:latest
tolerations:
- key: "gpu"
operator: "Equal"
value: "yes"
effect: "NoSchedule"
示例
第一种(NoSchedule)
不能被调度
//给node1加污点
[root@master ~]# kubectl taint node node1 node1:NoSchedule
node/node1 tainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: node1:NoSchedule
[root@master ~]# cat yy.yml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: default
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
[root@master ~]# kubectl apply -f yy.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 6s 10.244.2.50 node2 <none> <none>
//清除污点
[root@master ~]# kubectl taint node node1 node1:NoSchedule-
node/node1 untainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: <none>
第二种(PreferNoSchedule)
尽量不要调度,也有可能调度
[root@master ~]# kubectl taint node node1 node1:PreferNoSchedule
node/node1 tainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: node1:PreferNoSchedule
[root@master ~]# cat yy.yml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: default
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
[root@master ~]# kubectl apply -f yy.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 3s 10.244.2.51 node2 <none>
第三种(NoExecute)
驱逐
不仅不会调度,还会驱逐Node上已有的Pod
[root@master ~]# cat yy.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
namespace: default
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
//去掉node2的标签,留下node1的标签
[root@master ~]# kubectl label nodes node2 app-
node/node2 labeled
[root@master ~]# kubectl get nodes node1 node2 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1 Ready <none> 5d6h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
node2 Ready <none> 5d6h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master ~]# kubectl apply -f yy.yml
deployment.apps/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-d6cbf6c57-qbpz8 1/1 Running 0 8s 10.244.1.108 node1 <none> <none>
//给node1 添加污点后
[root@master ~]# kubectl taint node node1 node1:NoExecute
node/node1 tainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: node1:NoExecute
//此时在node2上
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-d6cbf6c57-r8d7l 1/1 Running 0 19s 10.244.2.52 node2 <none> <none>
示例
容忍
//帮助
[root@master ~]# kubectl explain deploy.spec.template.spec
KIND: Deployment
VERSION: apps/v1
RESOURCE: spec <Object>
DESCRIPTION:
Specification of the desired behavior of the pod. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
......
tolerations <[]Object>
If specified, the pod's tolerations.
topologySpreadConstraints <[]Object>
TopologySpreadConstraints describes how a group of pods ought to spread
across topology domains. Scheduler will schedule pods in a way which abides
by the constraints. All topologySpreadConstraints are ANDed.
volumes <[]Object>
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
[root@master ~]# cat yy.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
namespace: default
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
tolerations:
- key: node1
effect: NoExecute #容忍
//污点依然在node1节点上
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: node1:NoExecute
[root@master ~]# kubectl apply -f yy.yml
deployment.apps/test configured
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-84d56c4b4c-59m89 1/1 Running 0 3s 10.244.1.109 node1 <none> <none> #现在在node1节点上
test-d6cbf6c57-r8d7l 0/1 Terminating 6 12m 10.244.2.52 node2 <none> <none> #已经终止了
nodeName
- nodeName:指定节点名称,用于将Pod调度到指定的Node上,不经过调度器
[root@master ~]# cat yy.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
namespace: default
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeName: node2
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
[root@master ~]# kubectl apply -f yy.yml
deployment.apps/test created
[root@master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test-6d6f74fd55-dgbtg 1/1 Running 0 3s
//在node2节点上
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-6d6f74fd55-dgbtg 1/1 Running 0 8s 10.244.2.53 node2 <none>
更多推荐
已为社区贡献9条内容
所有评论(0)