七、heapster部署

heapster会在dashboard中显示每个容器消耗多少资源。但是只是属于参考,真正监控的是普罗米修斯。
heapster必须依赖dashboard,是dashboard下的一个插件

1、准入镜像:

[root@hdss7-200 dashboard]# docker pull quay.io/bitnami/heapster:1.5.4
[root@hdss7-200 dashboard]# docker image ls |grep heapster
quay.io/bitnami/heapster     1.5.4    c359b95ad38b        22 months ago       136MB

[root@hdss7-200 dashboard]# docker image tag c359b95ad38b harbor.od.com:180/public/heapster:v1.5.4
[root@hdss7-200 dashboard]# docker login harbor.od.com:180
[root@hdss7-200 dashboard]# docker push harbor.od.com:180/public/heapster:v1.5.4

2、准备资源配置清单

[root@hdss7-200 heapster]# mkdir -p /data/k8s-yaml/dashboard/heapster;cd /data/k8s-yaml/dashboard/heapster
[root@hdss7-200 dashboard]# vi /data/k8s-yaml/dashboard/heapster/rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: heapster
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: heapster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:heapster
subjects:
- kind: ServiceAccount
  name: heapster
  namespace: kube-system

# 集群角色绑定,用户名heapster的跟集群默认角色system:heapster进行绑定
[root@hdss7-21 ~]# kubectl get clusterrole |egrep "system:heapster"
system:heapster        13d

[root@hdss7-200 dashboard]# vi /data/k8s-yaml/dashboard/heapster/deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: heapster
    spec:
      serviceAccountName: heapster
      containers:
      - name: heapster
        image: harbor.od.com:180/public/heapster:v1.5.4
        imagePullPolicy: IfNotPresent
        command:
        - /opt/bitnami/heapster/bin/heapster
        - --source=kubernetes:https://kubernetes.default

command:      #一个运行的命令,想当于docker里面cmd指令
        - /opt/bitnami/heapster/bin/heapster   # 提供可以参考的监控指标
        - --source=kubernetes:https://kubernetes.default  # kubernetes.default使用短域名,省略了.svc.cluster.local(kubernetes.default.svc.cluster.local)。集群的内部可以使用service资源的名字进行调用的,因为有coredns协助你解析后端的IP,所以kubernetes.default通过coredns解析并指向了192.168.0.1:443 k8s的apiserver的集群地址

[root@hdss7-200 dashboard]# vi /data/k8s-yaml/dashboard/heapster/service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Heapster
  name: heapster
  namespace: kube-system
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    k8s-app: heapster
[root@hdss7-22 ~]# kubectl apply -f http://k8s-yaml.od.com/dashboard/heapster/rbac.yaml
serviceaccount/heapster created
clusterrolebinding.rbac.authorization.k8s.io/heapster created

[root@hdss7-22 ~]# kubectl apply -f http://k8s-yaml.od.com/dashboard/heapster/deployment.yaml
deployment.extensions/heapster created

[root@hdss7-22 ~]# kubectl apply -f http://k8s-yaml.od.com/dashboard/heapster/service.yaml
service/heapster created

[root@hdss7-22 ~]# kubectl get pods -n kube-system
NAME                                                            READY   STATUS                 RESTARTS   AGE
coredns-6b6c4f9648-p7t7g                            1/1          Running                 12                 12d
heapster-85c94856f7-mg8zd                         0/1          ContainerCreating  0                   58s
kubernetes-dashboard-7977cc79db-kgqdb   1/1          Running                  0                   62m
traefik-ingress-v7w25                                     1/1          Running                  6                  7d7h
traefik-ingress-xs4md                                     1/1          Running                  6                  7d7h
[root@hdss7-22 ~]# kubectl get pods -n kube-system
NAME                                                            READY   STATUS     RESTARTS   AGE
coredns-6b6c4f9648-p7t7g                             1/1         Running     12                 12d
heapster-85c94856f7-mg8zd                          1/1         Running      0                   60s
kubernetes-dashboard-7977cc79db-kgqdb    1/1         Running      0                   62m
traefik-ingress-v7w25                                      1/1         Running      6                  7d7h
traefik-ingress-xs4md                                      1/1         Running      6                  7d7h

3、重启dashboard容器验证:

[root@hdss7-21 ~]# kubectl delete pod kubernetes-dashboard-7977cc79db-kgqdb -n kube-system
pod "kubernetes-dashboard-7977cc79db-6p4g2" deleted
[root@hdss7-21 ~]# kubectl get pods -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-6b6c4f9648-p7t7g                1/1     Running   6          12d
heapster-85c94856f7-mg8zd               1/1     Running   0          60s
kubernetes-dashboard-7977cc79db-55l77   1/1     Running   0          31s
traefik-ingress-v7w25                   1/1     Running   10         7d7h
traefik-ingress-xs4md                   1/1     Running   10         7d7h

八、k8s的维护——平稳升级集群节点

升级理由:k8s作为开源软件,代码是开源的,所以漏洞也会被黑客发现,即使k8s在内网,也最好及时更新,并且升级的时候,要在流量低谷的时候

例子:当前版本是v1.15.2,升级到v.1.15.4 

1、查看当前节点的版本号

[root@hdss7-22 ~]# kubectl get node -o wide

NAME                STATUS   ROLES         AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
hdss7-21.host.com   Ready    master,node   22d   v1.15.2   10.4.7.21     <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13
hdss7-22.host.com   Ready    master,node   22d   v1.15.2   10.4.7.22     <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13

2、移除节点上的pod

思路:升级的时候,选择流量低谷,但并不说明把所有的业务宕掉,可以选择先升级部分节点把这部分节点的pod,重新分配给其他节点。所以升级的思路为:尽量移除一个升级一个,在移除一个升级一个等等。
分配原则:先看一下已经在运行的pods,并查看那个节点运行的pods更少先从这些节点开始,移动pod也会存在业务中断的卡顿,所以尽可能不要动作太大。

[root@hdss7-22 ~]# kubectl get pods -n kube-system -o wide        # 发现在22上pods节点更少,所以先移除22上pods并升级22

NAME                                    READY   STATUS    RESTARTS   AGE     IP           NODE                NOMINATED NODE   READINESS GATES
coredns-6b6c4f9648-p7t7g                1/1     Running   12         12d     172.7.21.2   hdss7-21.host.com   <none>           <none>
heapster-85c94856f7-mg8zd               1/1     Running   0          16h     172.7.21.6   hdss7-21.host.com   <none>           <none>
kubernetes-dashboard-7977cc79db-kgqdb   1/1     Running   0          17h     172.7.22.5   hdss7-22.host.com   <none>           <none>
traefik-ingress-v7w25                   1/1     Running   6          7d23h   172.7.22.4   hdss7-22.host.com   <none>           <none>
traefik-ingress-xs4md                   1/1     Running   6          7d23h   172.7.21.4   hdss7-21.host.com   <none>           <none>

把hdss7-22.host.com的节点(node)从k8s集群中摘出来
[root@hdss7-22 ~]# kubectl delete node hdss7-22.host.com
node "hdss7-22.host.com" deleted

[root@hdss7-22 ~]# kubectl get node        # 这个时候就剩下一下节点
NAME                       STATUS   ROLES          AGE   VERSION
hdss7-21.host.com   Ready      master,node   22d    v1.15.2

[root@hdss7-22 ~]# kubectl get pods -n kube-system -o wide     #  kubectl发现hdss7-22.host.com节点不存在, kubectl自动把原来在hdss7-22.host.com的节点的pods都转移到了hdss7-21.host.com

NAME                                    READY   STATUS    RESTARTS   AGE     IP           NODE                NOMINATED NODE   READINESS GATES
coredns-6b6c4f9648-p7t7g                1/1     Running   12         12d     172.7.21.2   hdss7-21.host.com   <none>           <none>
heapster-85c94856f7-mg8zd               1/1     Running   0          16h     172.7.21.6   hdss7-21.host.com   <none>           <none>
kubernetes-dashboard-7977cc79db-r25x6   1/1     Running   0          2m18s   172.7.21.8   hdss7-21.host.com   <none>           <none>
traefik-ingress-xs4md                   1/1     Running   6          7d23h   172.7.21.4   hdss7-21.host.com   <none>           <none>

验证访问kubernetes-dashboard网页是否正常,毕竟他是hdss7-22.host.com 迁移过来的,正常是没问题。尽管一个节点从集群摘出来,并不影响集群,集群有自愈机制,coredns一样可以解析集群域名。

[root@hdss7-22 ~]# dig -t A kubernetes.default.svc.cluster.local @192.168.0.2 +short  # 验证coredns是否正常。
192.168.0.1

3、注销四层跟七层的负载均衡

解释四层跟七层的负载均衡

1、外部访问*.od.com域名的七层流量,通过nginx反向代理的到后端(10.4.7.21:81、10.4.7.22:81)
ingress控制台(treafik),去掉22节点的treafik,防止外部访问集群资源,访问10.4.7.10后代理给
22节点,导致出现bad gateway。
2、集群内部通过apiserver四层流量进行各种资源互相通信,而10.4.7.10:7443做的就是
(10.4.7.21:6443、10.4.7.22:6443)apiserver的四层负载均衡,去掉22节点的apiserver,防止外部
访问集群资源,访问失败。

修改方案:
正常查看虚拟IP在哪个物理机,修改对应物理机nginx配置即可,也可以所有的nginx物理机都修改

[root@hdss7-11 conf.d]# ip addr show |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.4.7.11/8 brd 10.255.255.255 scope global noprefixroute ens33
    inet 10.4.7.10/32 scope global ens33

 # 注销掉,外网访问*.od.com域名的流量抛给treafik控制器,给到22的treafik控制器去掉
[root@hdss7-11 ~]# vi /etc/nginx/nginx.conf

# 注销掉,外网访问*.od.com域名的流量抛给treafik控制器,给到22的treafik控制器去掉
[root@hdss7-11 ~]# vi /etc/nginx/conf.d/od.com.conf

[root@hdss7-11 conf.d]# nginx -s reload

4、下载v1.15.4升级包,升级节点

[root@hdss7-22 ~]# cd /opt/
[root@hdss7-22 opt]# cd src/
[root@hdss7-22 src]# ll
-rw-r--r--. 1 root root   9850227 10月 11 2018 etcd-v3.1.20-linux-amd64.tar.gz
-rw-r--r--. 1 root root   9706487 12月  2 20:27 flannel-v0.10.0-linux-amd64.tar.gz
-rw-r--r--. 1 root root 443770238 9月  22 20:54 kubernetes-server-linux-amd64.tar.gz

[root@hdss7-22 src]# mv kubernetes-server-linux-amd64.tar.gz kubernetes-server-linux-amd64-v1.15.2.tar.gz   # 把之前的v1.15.2的安装包重命名加上v1.15.2
[root@hdss7-22 src]# wget https://dl.k8s.io/v1.15.4/kubernetes-server-linux-amd64.tar.gz # 下载v1.15.4安装包
[root@hdss7-22 src]# mv kubernetes-server-linux-amd64.tar.gz kubernetes-server-linux-amd64-v1.15.4.tar.gz   # 把v1.15.4安装包重命名加上v1.15.4

[root@hdss7-22 src]# ll
-rw-r--r--. 1 root root   9850227 10月 11 2018 etcd-v3.1.20-linux-amd64.tar.gz
-rw-r--r--. 1 root root   9706487 12月  2 20:27 flannel-v0.10.0-linux-amd64.tar.gz
-rw-r--r--. 1 root root 443770238 9月  22 20:54 kubernetes-server-linux-amd64-v1.15.2.tar.gz
-rw-r--r--. 1 root root 443976803 9月  19 2019 kubernetes-server-linux-amd64-v1.15.4.tar.gz

5、升级节点

[root@hdss7-22 opt]# rm -f /opt/kubernetes  # 删除/opt/kubernetes,因为v1.15.4安装包解压后就是kubernetes
[root@hdss7-22 opt]# tar -zxvf /opt/src/kubernetes-server-linux-amd64-v1.15.4.tar.gz -C /opt/
[root@hdss7-22 opt]# mv kubernetes kubernetes-v1.15.4
[root@hdss7-22 opt]# ln -s kubernetes-v1.15.4 kubernetes  #  现在kubernetes连接的是kubernetes-v1.15.4
[root@hdss7-22 opt]# ll
lrwxrwxrwx. 1 root root  18 12月 23 11:06 kubernetes -> kubernetes-v1.15.4
drwxr-xr-x. 4 root root  50 11月 30 13:07 kubernetes-v1.15.2
drwxr-xr-x. 4 root root  79 9月  18 2019 kubernetes-v1.15.4

[root@hdss7-22 opt]# cd /opt/kubernetes/server/bin/
[root@hdss7-22 bin]# rm -f *.tar *_tag

[root@hdss7-22 bin]# cp -r /opt/kubernetes-v1.15.2/server/bin/conf/ ./  # 把对应的kubernetes-v1.15.2的配置文件拷贝过来
[root@hdss7-22 bin]# cp -r /opt/kubernetes-v1.15.2/server/bin/cert/ ./  # 把对应的kubernetes-v1.15.2的证书文件拷贝过来
[root@hdss7-22 bin]# cp -r /opt/kubernetes-v1.15.2/server/bin/*.sh ./   # 把对应的kubernetes-v1.15.2的启动脚本文件拷贝过来
[root@hdss7-22 bin]# ll
-rwxr-xr-x. 1 root root  43538912 9月  18 2019 apiextensions-apiserver
drwxr-xr-x. 2 root root       228 12月 23 11:07 cert
-rwxr-xr-x. 1 root root 100605984 9月  18 2019 cloud-controller-manager
drwxr-xr-x. 2 root root        79 12月 23 11:07 conf
-rwxr-xr-x. 1 root root 200722064 9月  18 2019 hyperkube
-rwxr-xr-x. 1 root root  40186304 9月  18 2019 kubeadm
-rwxr-xr-x. 1 root root 164563360 9月  18 2019 kube-apiserver
-rwxr-xr-x. 1 root root      1222 12月 23 11:07 kube-apiserver-startup.sh
-rwxr-xr-x. 1 root root 116462624 9月  18 2019 kube-controller-manager
-rwxr--r--. 1 root root       450 12月 23 11:07 kube-controller-manager-startup.sh
-rwxr-xr-x. 1 root root  42985504 9月  18 2019 kubectl
-rwxr-xr-x. 1 root root 119690288 9月  18 2019 kubelet
-rwxr-xr-x. 1 root root       813 12月 23 11:07 kubelet-startup.sh
-rwxr-xr-x. 1 root root  36987488 9月  18 2019 kube-proxy
-rwxr-xr-x. 1 root root       293 12月 23 11:07 kube-proxy-startup.sh
-rwxr-xr-x. 1 root root  38786144 9月  18 2019 kube-scheduler
-rwxr--r--. 1 root root       253 12月 23 11:07 kube-scheduler-startup.sh
-rwxr-xr-x. 1 root root   1648224 9月  18 2019 mounter
[root@hdss7-22 bin]# supervisorctl restart all   #  这里全部重启,生产中建议用平滑重启supervisorctl restart kube-kubelet-7-22等等
kube-kubelet-7-22: stopped
flanneld-7-22: stopped
kube-apiserver-7-22: stopped
kube-proxy-7-22: stopped
etcd-server-7-22: stopped
flanneld-7-22: ERROR (spawn error)
kube-kubelet-7-22: started
kube-apiserver-7-22: started
kube-proxy-7-22: started
kube-controller-manager-7-22: started
kube-scheduler-7-22: started
etcd-server-7-22: started
[root@hdss7-22 bin]# supervisorctl status  #  这里发现flanneld启动不了
etcd-server-7-22                   RUNNING   pid 2275, uptime 0:00:36
flanneld-7-22                      FATAL     Exited too quickly (process log may have details)
kube-apiserver-7-22                RUNNING   pid 2264, uptime 0:00:36
kube-controller-manager-7-22       RUNNING   pid 2269, uptime 0:00:36
kube-kubelet-7-22                  RUNNING   pid 2262, uptime 0:00:36
kube-proxy-7-22                    RUNNING   pid 2266, uptime 0:00:36
kube-scheduler-7-22                RUNNING   pid 2270, uptime 0:00:36

分析:出现此问题是由于之前上面讲过,flanneld没有全部停止导致的,需要kill

[root@hdss7-22 bin]# tail -200f /data/logs/flanneld/flanneld.stdout.log 
error. listen tcp 0.0.0.0:2401: bind: address already in use

解决方案:
[root@hdss7-22 cert]# ps aux |grep flanneld  

root       1190  1.3  0.7 300556  7584 ?        Sl   14:05   0:06 /opt/flannel/flanneld --public-ip=10.4.7.22 --etcd-endpoints=https://10.4.7.12:2379,https://10.4.7.21:2379,https://10.4.7.22:2379 --etcd-keyfile=./cert/client-key.pem --etcd-certfile=./cert/client.pem --etcd-cafile=./cert/ca.pem --iface=ens33 --subnet-file=./subnet.env --healthz-port=2401
root       4322  0.0  0.0 112832   976 pts/0    S+   14:13   0:00 grep --color=auto flanneld

[root@hdss7-22 opt]# kill -9 1190
[root@hdss7-22 opt]# supervisorctl restart all

[root@hdss7-22 opt]# supervisorctl status   重启后发现kube-kubelet没起来
etcd-server-7-22                            RUNNING   pid 4658, uptime 0:04:11
flanneld-7-22                               RUNNING   pid 4499, uptime 0:04:25
kube-apiserver-7-22                         RUNNING   pid 4500, uptime 0:04:25
kube-controller-manager-7-22                RUNNING   pid 4502, uptime 0:04:25
kube-kubelet-7-22                           FATAL     Exited too quickly (process log may have details)
kube-proxy-7-22                             RUNNING   pid 4501, uptime 0:04:25
kube-scheduler-7-22                         RUNNING   pid 4506, uptime 0:04:25

查看原因:

[root@hdss7-22 opt]# tail -200 /data/logs/kubernetes/kube-kubelet/kubelet.stdout.log 
 Image garbage collection failed once. Stats initialization may not have completed yet: 
failed to get imageFs info: unable to find data in memory cache   # 出现此问题是之前进程没停止掉

解决方案:
[root@hdss7-22 opt]# ps aux |grep kubelet

root       4526  4.0 23.7 470756 236936 ?       Sl   14:13   0:22 /opt/kubernetes/server/bin/kube-apiserver --apiserver-count 2 --audit-log-path /data/logs/kubernetes/kube-apiserver/audit-log --audit-policy-file ./conf/audit.yaml --authorization-mode RBAC --client-ca-file ./cert/ca.pem --requestheader-client-ca-file ./cert/ca.pem --enable-admission-plugins NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --etcd-cafile ./cert/ca.pem --etcd-certfile ./cert/client.pem --etcd-keyfile ./cert/client-key.pem --etcd-servers https://10.4.7.12:2379,https://10.4.7.21:2379,https://10.4.7.22:2379 --service-account-key-file ./cert/ca-key.pem --service-cluster-ip-range 192.168.0.0/16 --service-node-port-range 3000-29999 --target-ram-mb=1024 --kubelet-client-certificate ./cert/client.pem --kubelet-client-key ./cert/client-key.pem --log-dir /data/logs/kubernetes/kube-apiserver --tls-cert-file ./cert/apiserver.pem --tls-private-key-file ./cert/apiserver-key.pem --v 2
root       6651  0.0  0.0 112828   976 pts/0    R+   14:23   0:00 grep --color=auto kubelet

[root@hdss7-22 opt]# kill -9 4526
[root@hdss7-22 opt]# supervisorctl start kube-kubelet-7-22
kube-kubelet-7-22: started

[root@hdss7-22 opt]# supervisorctl status
etcd-server-7-22                         RUNNING   pid 4658, uptime 0:11:47
flanneld-7-22                            RUNNING   pid 4499, uptime 0:12:01
kube-apiserver-7-22                      RUNNING   pid 6779, uptime 0:01:42
kube-controller-manager-7-22             RUNNING   pid 4502, uptime 0:12:01
kube-kubelet-7-22                        RUNNING   pid 6987, uptime 0:00:56
kube-proxy-7-22                          RUNNING   pid 4501, uptime 0:12:01
kube-scheduler-7-22                      RUNNING   pid 4506, uptime 0:12:01
[root@hdss7-22 opt]# 

6、去除nginx配置文件的注释验证结果

[root@hdss7-11 ~]# vi /etc/nginx/conf.d/od.com.conf 
[root@hdss7-11 ~]# vi /etc/nginx/nginx.conf   # 去掉注释

[root@hdss7-11 ~]# vi /etc/nginx/conf.d/od.com.conf   # 去掉注释

[root@hdss7-22 bin]# supervisorctl restart flanneld-7-22
flanneld-7-22: ERROR (not running)
flanneld-7-22: ERROR (spawn error)
[root@hdss7-22 bin]# systemctl restart supervisord
[root@hdss7-22 bin]# supervisorctl status
etcd-server-7-22                 STARTING  
flanneld-7-22                    STARTING  
kube-apiserver-7-22              STARTING  
kube-controller-manager-7-22     STARTING  
kube-kubelet-7-22                STARTING  
kube-proxy-7-22                  STARTING  
kube-scheduler-7-22              STARTING  

7、验证是否升级node成功

[root@hdss7-22 bin]# kubectl get node -o wide -n kube-system

NAME                STATUS   ROLES         AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
hdss7-21.host.com   Ready    master,node   22d   v1.15.2   10.4.7.21     <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13
hdss7-22.host.com   Ready    <none>        37m   v1.15.4   10.4.7.22     <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13

[root@hdss7-22 bin]# kubectl get pods -o wide -n kube-system   # 发现traefik-ingress-在22节点已经启动,但是有问题是其他节点如coredns、heapster依旧在21上

NAME                                    READY   STATUS    RESTARTS   AGE    IP           NODE                NOMINATED NODE   READINESS GATES
coredns-6b6c4f9648-p7t7g                1/1     Running   12         12d    172.7.21.2   hdss7-21.host.com   <none>           <none>
heapster-85c94856f7-mg8zd               1/1     Running   0          18h    172.7.21.6   hdss7-21.host.com   <none>           <none>
kubernetes-dashboard-7977cc79db-r25x6   1/1     Running   0          136m   172.7.21.8   hdss7-21.host.com   <none>           <none>
traefik-ingress-t27lb                   1/1     Running   0          37m    172.7.22.2   hdss7-22.host.com   <none>           <none>
traefik-ingress-xs4md                   1/1     Running   6          8d     172.7.21.4   hdss7-21.host.com   <none>           <none>
[root@hdss7-22 bin]# kubectl delete pod coredns-6b6c4f9648-p7t7g -n kube-system # 删除对应的coredns-pod,让kubectl重新拉起一个pod
pod "coredns-6b6c4f9648-p7t7g" deleted

[root@hdss7-22 src]# kubectl get pod -nkube-system -o wide  # 发现节点变成hdss7-22。

NAME                                    READY   STATUS    RESTARTS   AGE    IP           NODE                NOMINATED NODE   READINESS GATES
coredns-6b6c4f9648-6gps2                1/1     Running   0          52s    172.7.22.4   hdss7-22.host.com   <none>           <none>
heapster-85c94856f7-mg8zd               1/1     Running   0          18h    172.7.21.6   hdss7-21.host.com   <none>           <none>
kubernetes-dashboard-7977cc79db-r25x6   1/1     Running   0          147m   172.7.21.8   hdss7-21.host.com   <none>           <none>
traefik-ingress-t27lb                   1/1     Running   0          49m    172.7.22.2   hdss7-22.host.com   <none>           <none>
traefik-ingress-xs4md                   1/1     Running   6          8d     172.7.21.4   hdss7-21.host.com   <none>           <none>

注意:在升级hdss7-22.host.com后,重启supervisorctl所有节点,除了没有的pod,如hdss7-22.host.com 的traefik-ingress的pod在集群节点中没有,其他的coredns pods已经在21节点上,
kubectl不会由于你有新的节点hdss7-22.host.com添加后重新分配,除非你删除coredns pods,kubectl重新触发按照内存占用等机制重新分配

八、小结

1、对容器进行扩容

方式一:陈述式

[root@hdss7-22 ~]# kubectl scale deployment kubernetes-dashboard --replicas=2 -n kube-system
deployment.extensions/kubernetes-dashboard scaled
[root@hdss7-22 ~]# kubectl get pods -n kube-system    #  ​​​​​​​kubectl自动拉起一个
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-6b6c4f9648-fzxhd                1/1     Running   0          115m
heapster-85c94856f7-5h7n4               1/1     Running   0          114m
kubernetes-dashboard-7977cc79db-6nq8r   1/1     Running   1          113m
kubernetes-dashboard-7977cc79db-j5nt7   1/1     Running   0          3m1s
traefik-ingress-t27lb                   1/1     Running   1          5h16m
traefik-ingress-xs4md                   1/1     Running   7          8d

[root@hdss7-22 ~]# ipvsadm -Ln            #  ipvs规则会添加两个8443
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.1:443 nq
  -> 10.4.7.21:6443               Masq    1      2          0         
  -> 10.4.7.22:6443               Masq    1      1          0         
TCP  192.168.0.2:53 nq
  -> 172.7.22.3:53                 Masq    1      0          0         
TCP  192.168.0.2:9153 nq
  -> 172.7.22.3:9153             Masq    1      0          0         
TCP  192.168.18.241:80 nq
  -> 172.7.21.2:80                 Masq    1      0          0         
  -> 172.7.21.3:80                 Masq    1      0          0         
TCP  192.168.73.12:80 nq
  -> 172.7.21.7:80                 Masq    1      0          0         
  -> 172.7.22.2:80                 Masq    1      0          0         
TCP  192.168.73.12:8080 nq
  -> 172.7.21.7:8080              Masq    1      0          0         
  -> 172.7.22.2:8080              Masq    1      0          0         
TCP  192.168.185.124:443 nq
  -> 172.7.21.4:8443              Masq    1      0          0         
  -> 172.7.22.6:8443              Masq    1      0          0         

TCP  192.168.238.22:80 nq
  -> 172.7.22.5:8082              Masq    1      0          0         
UDP  192.168.0.2:53 nq

方式二:dashboard

2、总结外部机器访问内部流量:

pc笔记本浏览器访问dashboard.od.com,通过dns的解析到了vip,假设vip在hdss7-21,所以流量进入hdss7-21的7层负载均衡nginx,nginx看到请求的域名是dashboard.od.com,于是他匹配到了自定的dashboard.od.com配置,如果是请求其他的od.com,走的是*.od.com。假设浏览器是打dashboard.od.com,没有指定https,就等同于访问http://dashboard.od.com,由于没有指定的https,他就会走的是rewrite ^(.*)$ https://${server_name}$1 permanent,监听的8080端口帮助rewrite到443,443帮助卸载掉ssl证书,然后把流量抛给ingress宿主机的上,(ingress是基于pod封装在宿主机,由于配置ingress在每一个宿主机节点都启动一个pod,不是轮训),ingress容器监听每一个宿主机的81端,流量通过宿主机81端口引入到ingress容器。然后ingress控制器就会根据你的ingress资源配置找到host:dashboard.od.com对应的path(路径)根,然后把流量抛给了dashboard的service(192.168.185.124:443),此时kubectl找到了dashboard的service,kubectl帮助把service跟两个dashboard副本pod网络连接,进而帮助service找到pod,并且ipvs定义的是nq轮训算法,所以service会把流量通过轮训算法负载均衡到两个pod上,这两个pod可以交替的为你服务。什么原因可以使得跨宿主机两个pod能够互相访问:CNI-flanneld

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐