K8S二次开发02-kubeadm安装k8s集群

组件概览关于k8s整理架构，可参考：https://blog.csdn.net/liaomin416100569/article/details/86711655?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522164447827216780265446300%2522%252C%2522scm%2522%253A%252220140

liaomin416100569

2474人浏览 · 2022-02-16 16:38:45

liaomin416100569 · 2022-02-16 16:38:45 发布

组件概览

关于k8s整体架构，可参考：之前文章
在这里插入图片描述
Kubernetes主要由以下几个核心组件组成（必须安装）：

etcd保存了整个集群的状态；
apiserver提供了资源操作的唯一入口，并提供认证、授权、访问控制、API注册和发现等机制；
controller manager负责维护集群的状态，比如故障检测、自动扩展、滚动更新等；
scheduler负责资源的调度，按照预定的调度策略将Pod调度到相应的机器上；
kubelet负责维护容器的生命周期，同时也负责Volume（CVI）和网络（CNI）的管理，安装每台工作节点；
kube-proxy负责为Service提供cluster内部的服务发现和负载均衡，安装在每台工作节点；
除了核心组件，还有一些推荐的Add-ons（选装）：
kube-dns负责为整个集群提供DNS服务，应该使用kubectl添加一个服务到容器，是一个发布的应用服务于

其中kubelet是二进制安装包，其他组件均为docker镜像，kubeadm负责拉取镜像并初始化环境。
kubectl是客户端管理工具，同样是个二进制。
所以kubelet,kubeadm,kubectl需要通过yum|apt-get安装，其他组件通过kubeadm安装。

安装k8s

最简单，成功率最高的安装方式其实是使用rke安装，这里使用kubeadmin是想知道每个组件的单独原理和作用，具体安装教程参考：https://docs.rancher.cn/docs/rke/installation/_index/。

已经安装如果需要清除，可按照此步骤处理(重置k8s，删除所有运行容易和镜像)

kubeadm reset
docker ps -a | awk '{if(NR>1){print $1;system("docker stop "$1);system("docker rm "$1)}}';                
docker images  | awk '{system("docker rmi "$3)}'
rm -rf $HOME/.kube
同时清除下面安装网络章节的网络相关文件

准备机器

这里使用debian环境，主备一主一从两台机器，最好固定ip，ip最好在同一网段

k8s-master 10.10.0.115
k8s-worker 10.10.0.116

安装kubelet，kubeadm，kubectl，两台机器均相同
确保两台机器提前安装包docker

apt-get install docker-ce

设置阿里云源

sudo vim /etc/apt/sources.list.d/kubernetes.list

# 将下面的阿里源加入文件中
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main

# 也可以选择中科大的源
deb http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial main

里先运行一下 apt update, 会报错，原因是缺少相应的key，可以通过下面的命令添加(E084DAB9 为上面报错的key后8位)

gpg --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
gpg --export --armor E084DAB9 | sudo apt-key add -

下载安装

apt-get update && apt-get install -y kubelet kubeadm kubectl

关闭swap

如果不关闭kubernetes运行会出现错误，即使安装成功了，node重启后也会出现kubernetes server运行错误。

#暂时关闭，
sudo swapoff -a 

# 永久关闭
vim /etc/fstab
注释掉swap那一行就行
虚拟机最好把内存分配调整到2G以上，否则关掉swap会导致图形界面难以进入。

我这里是注释最后一行

root@liaok8s:/home/mainte# more /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda1 during installation
UUID=52c86df1-9442-46c0-a29c-b8f7d0a421ab /               ext4    errors=remount-ro 0       1
# swap was on /dev/sda5 during installation
#UUID=e4e792e9-d30b-4e4a-ab3c-26bd488adaae none            swap    sw              0       0
#/dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0

获取镜像

由于官方镜像地址被墙，所以我们需要首先获取所需镜像以及它们的版本。然后从国内阿里的镜像站获取。

kubeadm config images list

获取镜像列表后可以通过下面的脚本从阿里云获取：

for i in `kubeadm config images list`; do 
  #coredns镜像在gcr.io上是k8s.gcr.io/coredns/coredns:v1.8.6 在阿里云上是 registry.aliyuncs.com/google_containers/coredns:v1.8.6，需要特殊处理
  if echo $i | grep coredns/coredns;then
    imageName=${i#k8s.gcr.io/coredns/}  
    docker pull registry.aliyuncs.com/google_containers/$imageName
    docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/coredns/$imageName
    docker rmi registry.aliyuncs.com/google_containers/$imageName
  else
    imageName=${i#k8s.gcr.io/}
    docker pull registry.aliyuncs.com/google_containers/$imageName
    docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
    docker rmi registry.aliyuncs.com/google_containers/$imageName
  fi 
done;

主节点

初始化

kubeadm init --service-cidr=10.96.0.0/12 --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

初始化可常用的参数：

 参数说明

--apiserver-advertise-address=10.10.0.115    这个参数就是master主机的IP地址，例如我的Master主机的IP是：10.10.0.115

--image-repository=registry.aliyuncs.com/google_containers  这个是镜像地址，由于国外地址无法访问，故使用的阿里云仓库地址：registry.aliyuncs.com/google_containers，如果不指定就需要做下面获取镜像章节的动作。

--kubernetes-version=v1.17.4   这个参数是下载的k8s软件版本号

--service-cidr=10.96.0.0/12       这个参数后的IP地址直接就套用10.96.0.0/12 ,以后安装时也套用即可，不要更改

--pod-network-cidr=10.244.0.0/16       k8s内部的pod节点之间网络可以使用的IP段，不能和service-cidr写一样，如果不知道怎么配，就先用这个10.244.0.0/16

一般都会报错

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

查看官网介绍为 docker 和 kubelet 服务中的 cgroup 驱动不一致，有两种方法
方式一：驱动向 docker 看齐
方式二：驱动为向 kubelet 看齐
如果docker 不方便重启则统一向 kubelet看齐，并重启对应的服务即可
解决方式
docker 配置文件
这里采取的是方式二，docker 默认驱动为 cgroupfs ,只需要添加

“exec-opts”: [
“native.cgroupdriver=systemd”
],
修改后配置文件

root@controlplane:~# cat /etc/docker/daemon.json 
{
  "exec-opts": [
    "native.cgroupdriver=systemd"
  ],
  "bip":"172.12.0.1/24",
  "registry-mirrors": [
    "http://docker-registry-mirror.kodekloud.com"
  ]
}

重启docker

systemctl restart docker

kublete 配置文件
grep 截取一下,可以看得出来kubelet默认 cgoup 驱动为systemd


root@controlplane:~# cat /var/lib/kubelet/config.yaml |grep group
cgroupDriver: systemd

修改后重置并重新初始化可正常初始化

kubeadm reset && kubeadm init

成功后提示

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.10.0.115:6443 --token pedwrg.x3ocfkg6ui1t5yht \
        --discovery-token-ca-cert-hash sha256:a696588d58710779c758a0cdc4f0da3154af5d62f9b54420b5bb78d63f11e7a2

注意最后一段的kubeadm join是worker节点加入使用的，注意保存,如果忘记了可通过以下命令打印：

kubeadm token create --print-join-command

普通用户安装需要执行如下脚本：

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

root用户直接运行

  export KUBECONFIG=/etc/kubernetes/admin.conf

此时查看所有二进制安装的进程

root@liaok8s:/home/mainte# ps -ef | grep kube
root      7691  7622  2 16:30 ?        00:00:03 etcd --advertise-client-urls=https://10.10.0.115:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://10.10.0.115:2380 --initial-cluster=liaok8s=https://10.10.0.115:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.10.0.115:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://10.10.0.115:2380 --name=liaok8s --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
root      7736  7648  8 16:30 ?        00:00:10 kube-apiserver --advertise-address=10.10.0.115 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
root      7744  7661  2 16:30 ?        00:00:03 kube-controller-manager --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --use-service-account-credentials=true
root      7752  7695  1 16:30 ?        00:00:01 kube-scheduler --authentication-kubeconfig=/etc/kubernetes/scheduler.conf --authorization-kubeconfig=/etc/kubernetes/scheduler.conf --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true
root      7965     1  2 16:30 ?        00:00:03 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.6
root      8294  8274  0 16:31 ?        00:00:00 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=liaok8s
root      9929  1346  0 16:33 pts/0    00:00:00 grep kube

查看系统pod

root@liaok8s:/home/mainte# kubectl get pods --namespace kube-system
NAME                              READY   STATUS              RESTARTS   AGE
coredns-64897985d-f9l5d           0/1     ContainerCreating   0          3m18s
coredns-64897985d-gtbp6           0/1     ContainerCreating   0          3m18s
etcd-liaok8s                      1/1     Running             2          3m23s
kube-apiserver-liaok8s            1/1     Running             2          3m25s
kube-controller-manager-liaok8s   1/1     Running             2          3m23s
kube-proxy-2phdl                  1/1     Running             0          3m18s
kube-scheduler-liaok8s            1/1     Running             4          3m22s

发现dns关于网络的服务都是0个实例
同时检查node节点（目前就一个master节点，版本1.12.3）

root@liaok8s:/home/mainte# kubectl get node
NAME      STATUS   ROLES                  AGE    VERSION
liaok8s   Ready    control-plane,master   4m9s   v1.23.3

查看版本

root@liaok8s:/home/mainte# kubectl version --short=true
Client Version: v1.23.3
Server Version: v1.23.3

上面安装成功后如果通过查询kube-system下Pod的运行情况，会发现和网络相关的Pod都处于Pending的状态，这是因为缺少相关的网络插件。

安装网络

上面安装成功后如果通过查询kube-system下Pod的运行情况，会放下和网络相关的Pod都处于Pending的状态，这是因为缺少相关的网络插件，而网络插件有很多个（以下任选一个），可以选择自己需要的。
比较流行的几个为：

flannel
weave
calico
这里以calico为例
Calico是一个纯三层的协议，为OpenStack虚机和Docker容器提供多主机间通信。Calico不使用重叠网络比如flannel和libnetwork重叠网络驱动，
它是一个纯三层的方法，使用虚拟路由代替虚拟交换，每一台虚拟路由通过BGP协议传播可达信息（路由）到剩余数据中心。calico原理参考
不要直接使用下面方式下载安装，因为node间通信，默认会选择第一个物理网卡的ip，会导致选错ip，无法通信

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

应该首先下载calico.yaml
wget https://docs.projectcalico.org/manifests/calico.yaml
修改yaml文件容器名称为：calico-node，搜索大概这个内容

      containers:
        - name: calico-node
          image: docker.io/calico/node:v3.22.0

新增一个变量 env部分

            - name: IP_AUTODETECTION_METHOD
              value: "interface=ens.*"

意思是找物理主机的ip是使用ens开头的网卡，具体的本地物理网卡名称可以通过 ip addr查看

kubectl apply -f ./calico.yaml

检测pod是否安装成功

root@liaok8s:/home/mainte# kubectl get pods --namespace kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE    IP                NODE      NOMINATED NODE   READINESS GATES
calico-kube-controllers-566dc76669-dwlrk   1/1     Running   0          4m4s   192.168.134.129   liaok8s   <none>           <none>
calico-node-gtpcw                          1/1     Running   0          4m4s   10.10.0.115       liaok8s   <none>           <none>
coredns-64897985d-f9l5d                    1/1     Running   0          64m    172.16.134.131    liaok8s   <none>           <none>
coredns-64897985d-gtbp6                    1/1     Running   0          64m    172.16.134.129    liaok8s   <none>           <none>
etcd-liaok8s                               1/1     Running   2          64m    10.10.0.115       liaok8s   <none>           <none>
kube-apiserver-liaok8s                     1/1     Running   2          64m    10.10.0.115       liaok8s   <none>           <none>
kube-controller-manager-liaok8s            1/1     Running   2          64m    10.10.0.115       liaok8s   <none>           <none>
kube-proxy-2phdl                           1/1     Running   0          64m    10.10.0.115       liaok8s   <none>           <none>
kube-scheduler-liaok8s                     1/1     Running   4          64m    10.10.0.115       liaok8s   <none>           <none>

如果安装失败很有可能是之前安装reset过没有清除完，清除后重装

1、删除安装插件
kubectl delete -f ./calico.yaml
2、检查所有节点上的网络，看看是否存在Tunl0
若存在Tunl0，将其删除
modprobe -r ipip
3、移除与Calico网络插件有关的网络配置文件
ls /etc/cni/net.d/
rm -rf calico相关文件
rm -rf /etc/cni/net.d/*calico*
删除所有calico的pod
for i in `kubectl get pods --namespace kube-system`; do
if echo $i | grep calico;then
echo $i | awk 'system("kubectl delete --force pod "$1" --namespace kube-system")'
fi
done;
4、重新安装
kubectl apply -f ./calico.yaml

worker节点

同主节点kubelet和docker的驱动类型必须一致
修改/etc/docker/daemon.json 重启docker

 "exec-opts": [
    "native.cgroupdriver=systemd"
  ],

注意worker节点也要获取这些镜像，虽然只有部分使用，比如pause，否则kubelet会出现拉取错误
之前是没有在worker节点拉取镜像，在主节点查看calico网络的event日志

kubectl describe pods calico-node-ncwdc -n kube-system
Events:
  Type     Reason                    Age                   From               Message
  ----     ------                    ----                  ----               -------
  Normal   Scheduled                 15m                   default-scheduler  Successfully assigned kube-system/calico-node-ncwdc to pve-tmpl
  Warning  FailedCreatePodContainer  15m                   kubelet            unable to ensure pod container exists: failed to create container for [kubepods burstable podb1e24b58-fe92-42d6-8857-61d4d36638a2] : mkdir /sys/fs/cgroup/devices/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb1e24b58_fe92_42d6_8857_61d4d36638a2.slice: no such file or directory
  Warning  FailedCreatePodSandBox    15m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6": Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  FailedCreatePodSandBox    14m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6": Error response from daemon: Get "https://k8s.gcr.io/v2/": dial tcp 74.125.204.82:443: i/o timeout
  Warning  FailedCreatePodSandBox    10m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6": Error response from daemon: Get "https://k8s.gcr.io/v2/": dial tcp 142.251.8.82:443: i/o timeout
  Warning  FailedCreatePodSandBox    8m35s (x13 over 15m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6": Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  FailedCreatePodSandBox    8m6s                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6": Error response from daemon: Get "https://k8s.gcr.io/v2/": dial tcp 108.177.125.82:443: i/o timeout

使用之前的kubeadm join命令加入，如果忘记了可以在主节点获取

kubeadm token create --print-join-command

kubeadm join 10.10.0.115:6443 --token dprbpd.gdjxay6moqf10d05 --discovery-token-ca-cert-hash sha256:a696588d58710779c758a0cdc4f0da3154af5d62f9b54420b5bb78d63f11e7a2

加入成功后，可通过docker ps 查看

root@pve-tmpl:/home/mainte# docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED        STATUS        PORTS     NAMES
96170aa46f83   k8s.gcr.io/pause:3.6   "/pause"                 13 hours ago   Up 13 hours             k8s_POD_nginx_default_7c3d47ef-f478-40b4-b154-e9334cff14e3_1
6c627df7a7ca   f109b1742d34           "start_runit"            13 hours ago   Up 13 hours             k8s_calico-node_calico-node-j8x2d_kube-system_3875d252-86cf-47e4-8c86-e74f32356d51_1
c30f1e963eae   k8s.gcr.io/pause:3.6   "/pause"                 13 hours ago   Up 13 hours             k8s_POD_calico-node-j8x2d_kube-system_3875d252-86cf-47e4-8c86-e74f32356d51_1
0426d0f18715   9b7cc9982109           "/usr/local/bin/kube…"   13 hours ago   Up 13 hours             k8s_kube-proxy_kube-proxy-kql69_kube-system_838b79c1-14c1-46e1-a806-868ca3fa87d3_2
773974f94bba   k8s.gcr.io/pause:3.6   "/pause"                 13 hours ago   Up 13 hours             k8s_POD_kube-proxy-kql69_kube-system_838b79c1-14c1-46e1-a806-868ca3fa87d3_1

从节点回安装pause，calico，kubeproxy等组件，二进制组件

root@pve-tmpl:/home/mainte# ps -ef | grep kube
root       392     1  1 Feb10 ?        00:13:19 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.6
root       967   947  0 Feb10 ?        00:00:08 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=pve-tmpl

异常问题

calico容器异常

如果calico安装出现问题，一般都会导致kube-system下的calico相关的容器无法正常运行

root@liaok8s:/usr/local/bin# kubectl get pods --namespace kube-system
NAME                                       READY   STATUS    RESTARTS      AGE
calico-kube-controllers-566dc76669-dllq7   1/1     Running   0             13h
calico-node-j2ffk                          1/1     Running   0             13h
calico-node-j8x2d                          1/1     Running   1 (13h ago)   13h
coredns-64897985d-f9l5d                    1/1     Running   1 (13h ago)   17h
coredns-64897985d-gtbp6                    1/1     Running   1 (13h ago)   17h
etcd-liaok8s                               1/1     Running   3 (13h ago)   17h
kube-apiserver-liaok8s                     1/1     Running   4 (13h ago)   17h
kube-controller-manager-liaok8s            1/1     Running   3 (13h ago)   17h
kube-proxy-2phdl                           1/1     Running   1 (13h ago)   17h
kube-proxy-kql69                           1/1     Running   2 (13h ago)   15h
kube-scheduler-liaok8s                     1/1     Running   5 (13h ago)   17h

如果任意一个容器出现异常可以通过查看event发现问题

kubectl describe pod calico-node-j2ffk -n kube-system

我这里之前没有修改calico.yaml直接apply导致出现了一个问题

  Warning  Unhealthy  23s (x3 over 25s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  20s                kubelet            Readiness probe failed: 2022-02-10 10:15:15.807 [INFO][250] confd/health.go 180: Number of node(s) with BGP peering established = 0

这个问题就是calico通信无法识别主机间真实的物理ip引起的，可通过安装calicoctl诊断
安装请参考官网：https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
正确的结果为
在master节点，peer_address为多个从节点的ip，一个从节点一个，INFO为Established

root@liaok8s:/usr/local/bin# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.10.0.116  | node-to-node mesh | up    | 12:29:59 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

在worker节点,peer_address为多个从节点的ip和主节点的ip，一个从节点一个，INFO为Established

root@pve-tmpl:/home/mainte# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.10.0.115  | node-to-node mesh | up    | 12:29:59 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

注意这些ip如果不是物理主机的ip，变成了某个网卡的内部地址就会出现通信问题。

在主和工作节点上都会有个tunl0的网卡，工作节点的tunl0的ip地址就是pod的节点ip段

无法ping通clusterip和服务名

因为我之前重置过多次集群，没有指定固定的pod ip段，导致master和worker节点tunl0的ip不在同一个网段，导致无法正常通讯。
在这里插入图片描述
此时我们可以创建一个dnsutils镜像在kube-system下用来诊断网络

apiVersion: v1
kind: Pod
metadata:
	name: dnsutils
spec:
	containers:
	- name: dnsutils
	image: mydlqclub/dnsutils:1.3
	imagePullPolicy: IfNotPresent
	command: ["sleep","3600"]

通过 Kubectl 工具部署 NDS 工具镜像
通过 Kubectl 工具，将对上面 DNS 工具镜像部署到 Kubernetes 中：

-n：指定应用部署的 Namespace 空间。

$ kubectl create -f ndsutils.yaml -n kube-system

进入 DNS 工具 Pod 的命令行
上面 DNS 工具已经部署完成，我们可也通过 Kubectl 工具进入 Pod 命令行，然后，使用里面的一些工具进行问题分析，命令如下：
exec：让指定 Pod 容器执行某些命令。
-i：将控制台内容传入到容器。
-t：进入容器的 tty 使用 bash 命令行。
-n：指定上面部署 DNS Pod 所在的 Namespace。

$ kubectl exec -it dnsutils /bin/sh -n kube-system

通过 Ping 和 Nsloopup 命令测试
进入容器 sh 命令行界面后，先使用 ping 命令来分别探测观察是否能够 ping 通集群内部和集群外部的地址
首先确认下ndsutils的ip地址：192.168.57.136和第一个coredns相同。
进入容器后 :

/ # ping 192.168.57.136   coredns1能ping通，在同一个worker节点
PING 192.168.57.136 (192.168.57.136): 56 data bytes
64 bytes from 192.168.57.136: seq=0 ttl=64 time=0.058 ms
64 bytes from 192.168.57.136: seq=1 ttl=64 time=0.045 ms
^C
--- 192.168.57.136 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.045/0.051/0.058 ms
/ # ping 192.168.134.132 master对应的coredns无法ping通
PING 192.168.134.132 (192.168.134.132): 56 data bytes
^C
--- 192.168.134.132 ping statistics ---
11 packets transmitted, 0 packets received, 100% packet loss

查看容器的/etc/resolv.conf文件

/ # more /etc/resolv.conf 
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

发现nameserver指向的是service=kube-dns的集群ip
在这里插入图片描述
注意clusterip是不能ping的可以通过ip:端口组合来访问。
此时ping下外网，不通的

/ # ping www.baidu.com
^C

kube-dns的service实际是负载均衡到两个coredns上，其中一个是通的我们设置dns服务是这个通的试试

#nameserver 10.96.0.10
nameserver 192.168.57.138
search kube-system.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

再次测试发现正常

/ # ping www.baidu.com
PING www.baidu.com (110.242.68.4): 56 data bytes
64 bytes from 110.242.68.4: seq=0 ttl=52 time=44.204 ms
64 bytes from 110.242.68.4: seq=1 ttl=52 time=44.330 ms
64 bytes from 110.242.68.4: seq=2 ttl=52 time=43.863 ms
64 bytes from 110.242.68.4: seq=3 ttl=52 time=44.107 ms
64 bytes from 110.242.68.4: seq=4 ttl=52 time=44.148 ms
64 bytes from 110.242.68.4: seq=5 ttl=52 time=44.329 ms
64 bytes from 110.242.68.4: seq=6 ttl=52 time=44.210 ms
64 bytes from 110.242.68.4: seq=7 ttl=52 time=44.204 ms
64 bytes from 110.242.68.4: seq=8 ttl=52 time=44.113 ms
64 bytes from 110.242.68.4: seq=9 ttl=52 time=44.161 ms
^C
--- www.baidu.com ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 43.863/44.166/44.330 ms
/ # ping nginx.default
PING nginx.default (10.99.35.228): 56 data bytes
^C
--- nginx.default ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
/ # ping httpd.default
PING httpd.default (10.98.37.38): 56 data bytes
^C
/ # wget nginx.default   注意属于两个不同的namespace所以需要在服务名称后加上.namesapce:端口访问，ip是不能直接ping通的 但是可以获取到dns的地址
Connecting to nginx.default (10.99.35.228:80)
index.html           100% |**********************************************************************************************************************************************************************************************|   615   0:00:00 ETA
/ # more index.html
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
/ #

这里可以想到 ,想着是否我们需要将pod的ip全部都设置为相同的段能访问是否就可行了，重置k8s,init
带上–pod-network-cidr=10.244.0.0/16 发现依然不行，会不是是coredns不应该在master节点上有个pod的了。
重新按照coredns，参考： https://github.com/coredns/deployment/tree/master/kubernetes
下载这两个文件

wget https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/coredns.yaml.sed
wget https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/deploy.sh

执行命令(-i指定的参数是kube-dns clusterid的地址，默认是10.96.0.10)，

注意先提前删除kube-system下的deploy=coredns,service=kube-dns

kubectl delete deploy coredns && kubectl delete svc kube-dns  
chmod +x ./deploy.sh && ./deploy.sh -i 10.96.0.10 | kubectl apply -f -

如果命令执行缺少jq，提前用yum或者apt-get或者apk安装，重新安装完成，在测试一切正常了，但是这个命令安装只是产生了一个coredns的pod，依然有点不知所以然虽然正常运行，后续在新增一个worker节点在测试下他的service分发的逻辑。

发布测试

主节点运行

kubectl run nginx --image=nginx

查看nginx状态

root@liaok8s:/usr/local/bin# kubectl get pods -o wide                     
NAME    READY   STATUS    RESTARTS      AGE   IP               NODE       NOMINATED NODE   READINESS GATES
nginx   1/1     Running   1 (13h ago)   14h   192.168.57.130   pve-tmpl   <none>           <none>

查看执行过程（如果ready状态一直是0）

kubectl describe pod nginx

通过pod ip访问http地址

root@liaok8s:/usr/local/bin# curl 192.168.57.130
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

创建service

root@liaok8s:/usr/local/bin# kubectl expose pod nginx --port=80
service/nginx exposed

查看service

root@liaok8s:/usr/local/bin# kubectl get service nginx -o wide
NAME    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE   SELECTOR
nginx   ClusterIP   10.99.35.228   <none>        80/TCP    46s   run=nginx

通过集群ip负载访问容器

root@liaok8s:/usr/local/bin# curl 10.99.35.228
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

验证dns正确性
安装一个apahce httpd服务

kubectl run httpd --image httpd

安装成功后进入worker节点宿主机，进入nginx容器

重新启动

如果因为异常断电，导致k8s停止，可设置集群自动启动
配置环境变量

#centos下
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile && source ~/.bash_profile
#debian下
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bashrc && source ~/.bashrc

在主工作节点设置服务自启

systemctl enable kubelet && systemctl restart kubelet

如果kubelet无法启动，很大可能是swap未正常关闭，sudo swapoff -a 再试试

图形管理工具

Kubernetes 容器编排已越来越被大家关注，然而使用 Kubernetes 的门槛却依然很高，主要体现在这几个方面：

集群的安装复杂，出错概率大
Kubernetes相较于容器化，引入了许多新的概念，学习难度高
需要手工编写 YAML 文件，难以在多环境下管理
缺少好的实战案例可以参考

Kuboard，是一款免费的 Kubernetes 图形化管理工具，Kuboard 力图帮助用户快速在 Kubernetes 上落地微服务。

安装

如果您参考 https://kuboard.cn 网站上提供的 Kubernetes 安装文档，可在 master 节点上执行以下命令。

kubectl apply -f https://kuboard.cn/install-script/kuboard.yaml

查看 Kuboard 运行状态：

# kubectl get pods -l k8s.eip.work/name=kuboard -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE
kuboard-756d46c4d4-qh6cm   1/1     Running   0          101m

确保kuboard 处于 Running 状态

获取Token

您可以获得管理员用户、只读用户的Token。
Kuboard 有计划开发权限设置的功能，在这之前，如果您需要更细粒度的权限控制，请参考 RBAC Example
此Token拥有 ClusterAdmin 的权限，可以执行所有操作

# kubectl -n kube-system get secret $(kubectl -n kube-system get secret | grep kuboard-user | awk '{print $1}') -o go-template='{{.data.token}}' | base64 -d

Kuboard Service 使用了 NodePort 的方式暴露服务，NodePort 为 32567；您可以按如下方式访问 Kuboard。

http://任意一个Worker节点的IP地址:32567/

输入前一步骤中获得的 token，可进入 Kuboard 集群概览页
在这里插入图片描述

Kuboard v3.x

Kuboard v3.x 支持 Kubernetes 多集群管理。如果您从 Kuboard v1.0.x 或者 Kuboard v2.0.x 升级到 Kuboard，请注意：

您可以同时使用 Kuboard v3.x 和 Kuboard v2.0.x；
Kuboard v3.x 支持 amd64 (x86) 架构和 arm68 (armv8) 架构的 CPU；
可参考kuboard官网：https://kuboard.cn/install/v3/install-in-k8s.html
在线安装：

kubectl apply -f https://addons.kuboard.cn/kuboard/kuboard-v3.yaml

在这里插入图片描述
访问 Kuboard
在浏览器中打开链接 http://your-node-ip-address:30080

输入初始用户名和密码，并登录

用户名： admin
密码： Kuboard123

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub