k8s x509: certificate signed by unknown authority
问题描述环境:kubernetes版本:v1.18.6网络插件:Calicok8s集群安装好后,进行reset操作(kubeadm reset),再进行init操作,在部署项目时(pod无法创建,停留在创建容器状态),查看对应pod的描述信息,事件部分如下所示:Events:TypeReasonAgeFromMessage----------
问题描述
环境:
kubernetes版本:v1.18.6
网络插件:Calico
k8s集群安装好后,进行reset操作(kubeadm reset),再进行init操作,在部署项目时(pod无法创建,停留在创建容器状态),查看对应pod的描述信息,事件部分如下所示:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned default/myweb-deployment-79ffbcbb48-tzmjm to hualisicn
Warning FailedCreatePodSandBox 18s kubelet, hualisicn Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "78865d613992e986b6bc4387292639482831128cb13cbde0796dd4b536d3c107" network for pod "myweb-deployment-79ffbcbb48-tzmjm": networkPlugin cni failed to set up pod "myweb-deployment-79ffbcbb48-tzmjm_default" network: error getting ClusterInformation: Get "https://[10.10.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "78865d613992e986b6bc4387292639482831128cb13cbde0796dd4b536d3c107" network for pod "myweb-deployment-79ffbcbb48-tzmjm": networkPlugin cni failed to teardown pod "myweb-deployment-79ffbcbb48-tzmjm_default" network: error getting ClusterInformation: Get "https://[10.10.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
Normal SandboxChanged 4s (x3 over 18s) kubelet, hualisicn Pod sandbox changed, it will be killed and re-created.
根据提示内容,是证书无效。
解决过程
问题出现后,Google,百度一顿搜索,然后尝试各种可能的解决方案,譬如:$HOME/.kube/config文件没有清理,/etc/kubernetes/下的相关配置未清理(其实在进行reset时已被自动删除)等等。多种方案均未解决问题,上面说的第一种方案,会造成kubectl基础命令无法执行,效果如下所示,
kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?
本质是旧的配置未更新造成的,只需要将/etc/kubernetes/admin.conf拷贝过来覆盖即可(针对root用户)。
但是,笔者遇到的问题与此不同,为此花费数个小时研究到底是什么问题。
笔者在执行kubectl相关命令时没有任何问题,但在部署deployment时,对应的pod一直处于“ContainerCreating”状态,经过describe,看到的事件信息就如问题描述中的那样,在经过大量测试后,灵感突现,是不是网络组件出了问题了呢?
笔者通过官方配置,重新部署了Calico,发现问题竟然没了。随后又进行多次复现,基本确定是由于这个原因造成的。
至此,问题已解决。但是并没有深刻理解原理,后面对此部分内容有新的理解将在此补充。
更多推荐
所有评论(0)