I used the AWS Kubernetes Quickstart to create a Kubernetes cluster in a VPC and private subnet: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud.pdf. It was running fine for a while. I have Calico installed on my Kubernetes cluster. I have two nodes and a master. The calico pods on the master are running fine, the ones on the nodes are in crashloopbackoff state:
NAME READY STATUS RESTARTS AGE
calico-etcd-ztwjj 1/1 Running 1 55d
calico-kube-controllers-685755779f-ftm92 1/1 Running 2 55d
calico-node-gkjgl 1/2 CrashLoopBackOff 270 22h
calico-node-jxkvx 2/2 Running 4 55d
calico-node-mxhc5 1/2 CrashLoopBackOff 9 25m
Describing one of the crashed pods:
ubuntu@ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name: calico-node-gkjgl
Namespace: kube-system
Node: ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time: Mon, 17 Sep 2018 16:56:41 +0000
Labels: controller-revision-hash=185957727
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.0.0.237
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
Image: quay.io/calico/node:v3.1.1
Image ID: docker-pullable://quay.io/calico/node@sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 Sep 2018 15:14:44 +0000
Finished: Tue, 18 Sep 2018 15:14:44 +0000
Ready: False
Restart Count: 270
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: kubeadm,bgp
CALICO_DISABLE_FILE_LOGGING: true
CALICO_K8S_NODE_REF: (v1:spec.nodeName)
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_IPV4POOL_IPIP: Always
FELIX_IPV6SUPPORT: false
FELIX_IPINIPMTU: 1440
FELIX_LOGSEVERITYSCREEN: info
IP: autodetect
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
install-cni:
Container ID: docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
Image: quay.io/calico/cni:v3.1.1
Image ID: docker-pullable://quay.io/calico/cni@sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Running
Started: Mon, 17 Sep 2018 17:11:52 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 17 Sep 2018 16:56:43 +0000
Finished: Mon, 17 Sep 2018 17:10:53 +0000
Ready: True
Restart Count: 1
Environment:
CNI_CONF_NAME: 10-calico.conflist
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
calico-cni-plugin-token-b7sfl:
Type: Secret (a volume populated by a Secret)
SecretName: calico-cni-plugin-token-b7sfl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
:NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m (x6072 over 22h) kubelet, ip-10-0-0-237.us-east-2.compute.internal Back-off restarting failed container
The logs for the same pod:
ubuntu@ip-10-0-1-133:~$ kubectl logs calico-node-gkjgl -n kube-system -c calico-node
2018-09-18 15:14:44.605 [INFO][8] startup.go 251: Early log level set to info
2018-09-18 15:14:44.605 [INFO][8] startup.go 269: Using stored node name from /var/lib/calico/nodename
2018-09-18 15:14:44.605 [INFO][8] startup.go 279: Determined node name: ip-10-0-0-237.us-east-2.compute.internal
2018-09-18 15:14:44.609 [INFO][8] startup.go 101: Skipping datastore connection test
2018-09-18 15:14:44.610 [INFO][8] startup.go 352: Building new node resource Name="ip-10-0-0-237.us-east-2.compute.internal"
2018-09-18 15:14:44.610 [INFO][8] startup.go 367: Initialize BGP data
2018-09-18 15:14:44.614 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface ens3: 10.0.0.237/19
2018-09-18 15:14:44.614 [INFO][8] startup.go 432: Node IPv4 changed, will check for conflicts
2018-09-18 15:14:44.618 [WARNING][8] startup.go 861: Calico node 'ip-10-0-0-237' is already using the IPv4 address 10.0.0.237.
2018-09-18 15:14:44.618 [WARNING][8] startup.go 1058: Terminating
Calico node failed to start
So it seems like there is a conflict finding the node IP address, or Calico seems to think the IP is already assigned to another node. Doing a quick search i found this thread: https://github.com/projectcalico/calico/issues/1628. I see that this should be resolved by setting the IP_AUTODETECTION_METHOD to can-reach=DESTINATION, which I'm assuming would be "can-reach=10.0.0.237". This config is an environment variable set on calico/node container. I have been attempting to shell into the container itself, but kubectl tells me the container is not found:
ubuntu@ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")
I'm suspecting this is due to Calico being unable to assign IPs. So I logged onto the host and attempt to shell on the container using docker:
root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
So I guess there is no shell to execute in the container. Makes sense why Kubernetes couldn't execute that. I tried running commands externally to list environment variables, but I haven't been able to find any, I could be running these commands wrong however:
root@ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"
root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"
Okay, so maybe I am going about this the wrong way. Should I attempt to change the Calico config files using Kubernetes and redeploy it? Where can I find these on my system? I haven't been able to find where to set the environment variables.
所有评论(0)