转载请注明出处:k8s服务发布错误排查

K8s错误排查步骤是我们使用k8s的必经之路,可以参考如下步骤:

首先查看pod的情况,使用命令:

kubectl get pods

输出如下:

[zzq@localhost zzq]$ kubectl get pods
NAME                          READY     STATUS                  RESTARTS   AGE
report-api-57f64db6c7-6zksv   0/1       Init:0/1                0          1m
report-api-57f64db6c7-mqn7x   0/1       Init:CrashLoopBackOff   2          1m

拿到pod的name,查看详细的情况,使用命令:

kubectl describe pod report-api-57f64db6c7-mqn7x

这里report-api-57f64db6c7-mqn7x跟上面kubectl get pods中的name对应。
输出如下:

[zzq@localhost zzq]$ kubectl describe pod report-api-57f64db6c7-mqn7x
Name:           report-api-57f64db6c7-mqn7x
Namespace:      default
Node:           ip-10-10-133-37.cn-northwest-1.compute.internal/10.10.133.37
Start Time:     Thu, 13 Sep 2018 14:32:10 +0800
Labels:         app=report-api
                pod-template-hash=1392086273
Annotations:    kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container report-api; cpu request for init container pull-lib
Status:         Pending
IP:             100.96.6.2
Controlled By:  ReplicaSet/report-api-57f64db6c7
Init Containers:
  pull-lib:
    Container ID:  docker://41ada0ce00b3c724466abc3f2b945c7e85f59244ac52623ef235216b3adb64f6
    Image:         anigeo/awscli:latest
    Image ID:      docker-pullable://anigeo/awscli@sha256:910a18d43a9e936f38313b0dc44fbd7dc25303fab4ea89c4d7b082fefc654c8d
    Port:          <none>
    Host Port:     <none>
    Args:
      s3
      cp
      s3://general-data-group/lib/report-api/report-api-1.0.0-SNAPSHOT.jar
      /jar/
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 13 Sep 2018 14:35:52 +0800
      Finished:     Thu, 13 Sep 2018 14:35:52 +0800
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:  100m
    Environment:
      AWS_DEFAULT_REGION:  cn-northwest-1
    Mounts:
      /jar from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gs59h (ro)
Containers:
  report-api:
    Container ID:  
    Image:         java:8
    Image ID:      
    Port:          9999/TCP
    Host Port:     0/TCP
    Command:
      java
    Args:
      -jar
      /jar/report-api-1.0.0-SNAPSHOT.jar
      --spring.profiles.active=prod
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /jar from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gs59h (ro)
Conditions:
  Type           Status
  Initialized    False 
  Ready          False 
  PodScheduled   True 
Volumes:
  workdir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  default-token-gs59h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gs59h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age              From                                                      Message
  ----     ------                 ----             ----                                                      -------
  Normal   Scheduled              4m               default-scheduler                                         Successfully assigned report-api-57f64db6c7-mqn7x to ip-10-10-133-37.cn-northwest-1.compute.internal
  Normal   SuccessfulMountVolume  4m               kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  MountVolume.SetUp succeeded for volume "workdir"
  Normal   SuccessfulMountVolume  4m               kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  MountVolume.SetUp succeeded for volume "default-token-gs59h"
  Normal   Pulling                3m (x4 over 4m)  kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  pulling image "anigeo/awscli:latest"
  Normal   Pulled                 3m (x4 over 4m)  kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  Successfully pulled image "anigeo/awscli:latest"
  Normal   Created                3m (x4 over 4m)  kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  Created container
  Normal   Started                3m (x4 over 4m)  kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  Started container
  Warning  BackOff                3m (x7 over 4m)  kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal  Back-off restarting failed container
[zzq@localhost zzq]$ 

仔细看输出发现是在Init Containers的pull-lib阶段报错了 CrashLoopBackOff

 pull-lib:
    Container ID:  docker://41ada0ce00b3c724466abc3f2b945c7e85f59244ac52623ef235216b3adb64f6
    Image:         anigeo/awscli:latest
    Image ID:      docker-pullable://anigeo/awscli@sha256:910a18d43a9e936f38313b0dc44fbd7dc25303fab4ea89c4d7b082fefc654c8d
    Port:          <none>
    Host Port:     <none>
    Args:
      s3
      cp
      s3://general-data-group/lib/report-api/report-api-1.0.0-SNAPSHOT.jar
      /jar/
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 13 Sep 2018 14:35:52 +0800
      Finished:     Thu, 13 Sep 2018 14:35:52 +0800

看看代码我们这里要做的从s3获取文件的操作,但是不知道为什么出错。

使用日志查看命令查看pull-lib阶段的情况,使用命令如下:

kubectl logs  report-api-57f64db6c7-mqn7x --container pull-lib

输出如下:

[zzq@localhost zzq]$ kubectl logs  report-api-57f64db6c7-mqn7x --container pull-lib
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
[zzq@localhost zzq]$ 

403禁止访问,说明我们没有s3的权限。

确认修复后重新发布即可。

程序正确启动后查看运行日志 使用命令

kubectl logs  report-api-57f64db6c7-mqn7x -f

进入容器操作shell

kubectl exec -ti report-api-797c974bff-bdxjf -- bash

转载请注明出处:k8s服务发布错误排查

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐