问题描述

环境:

kubernetes版本:v1.18.6

网络插件:Calico

k8s集群安装好后,进行reset操作(kubeadm reset),再进行init操作,在部署项目时(pod无法创建,停留在创建容器状态),查看对应pod的描述信息,事件部分如下所示:

Events:
  Type     Reason                  Age               From                Message
  ----     ------                  ----              ----                -------
  Normal   Scheduled               19s               default-scheduler   Successfully assigned default/myweb-deployment-79ffbcbb48-tzmjm to hualisicn
  Warning  FailedCreatePodSandBox  18s               kubelet, hualisicn  Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "78865d613992e986b6bc4387292639482831128cb13cbde0796dd4b536d3c107" network for pod "myweb-deployment-79ffbcbb48-tzmjm": networkPlugin cni failed to set up pod "myweb-deployment-79ffbcbb48-tzmjm_default" network: error getting ClusterInformation: Get "https://[10.10.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "78865d613992e986b6bc4387292639482831128cb13cbde0796dd4b536d3c107" network for pod "myweb-deployment-79ffbcbb48-tzmjm": networkPlugin cni failed to teardown pod "myweb-deployment-79ffbcbb48-tzmjm_default" network: error getting ClusterInformation: Get "https://[10.10.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
  Normal   SandboxChanged          4s (x3 over 18s)  kubelet, hualisicn  Pod sandbox changed, it will be killed and re-created.

根据提示内容,是证书无效。

解决过程

问题出现后,Google,百度一顿搜索,然后尝试各种可能的解决方案,譬如:$HOME/.kube/config文件没有清理,/etc/kubernetes/下的相关配置未清理(其实在进行reset时已被自动删除)等等。多种方案均未解决问题,上面说的第一种方案,会造成kubectl基础命令无法执行,效果如下所示,

kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?

本质是旧的配置未更新造成的,只需要将/etc/kubernetes/admin.conf拷贝过来覆盖即可(针对root用户)。

但是,笔者遇到的问题与此不同,为此花费数个小时研究到底是什么问题。

笔者在执行kubectl相关命令时没有任何问题,但在部署deployment时,对应的pod一直处于“ContainerCreating”状态,经过describe,看到的事件信息就如问题描述中的那样,在经过大量测试后,灵感突现,是不是网络组件出了问题了呢?

笔者通过官方配置,重新部署了Calico,发现问题竟然没了。随后又进行多次复现,基本确定是由于这个原因造成的。

至此,问题已解决。但是并没有深刻理解原理,后面对此部分内容有新的理解将在此补充。

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐