OpenShift 4.x HOL教程汇总
说明:本文已经在OpenShift 4.6环境中验证

OpenShift节点运行容器过程

OpenShift 4的集群节点使用了基于CRI-O的容器运行环境。每个节点的kubelet通过gRPC调用CRI-O,而CRI-O运行符合OCI规范的容器。

  1. Kubernetes联系kubelet来启动一个pod。
  2. kubelet将请求通过CRI(Container runtime interface,容器运行时接口)转发给CRI-O守护进程,然后来启动新的POD。
  3. CRI-O使用从容器注册表中拉取Image。
  4. 下载的Image会被解压到容器的根文件系统中。
  5. 在为容器创建了根文件系统后,CRI-O会生成一个OCI运行时规范json文件,描述如何使用OCI生成工具运行容器。
  6. 然后,CRI-O 使用该规范启动 OCI 兼容运行时以运行容器进程。默认的 OCI Runtime 是 runc。
  7. 每个容器由一个单独的conmon(container monitor)进程监控。它处理容器的日志,并记录容器进程的退出代码。
  8. pod的网络是通过使用CNI来设置的,所以任何CNI插件都可以和CRI-O一起使用。

在这里插入图片描述

查看节点的Kubelet服务

  1. 查看OpenShift集群的节点,然后进入一个worker类型的节点,最后切换到root用户。
$ oc get nodes -o wide
NAME                                              STATUS   ROLES    AGE    VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-134-103.ap-southeast-1.compute.internal   Ready    master   143m   v1.19.0+d59ce34   10.0.134.103   <none>        Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
ip-10-0-157-96.ap-southeast-1.compute.internal    Ready    worker   129m   v1.19.0+d59ce34   10.0.157.96    <none>        Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
ip-10-0-178-197.ap-southeast-1.compute.internal   Ready    worker   129m   v1.19.0+d59ce34   10.0.178.197   <none>        Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
ip-10-0-178-236.ap-southeast-1.compute.internal   Ready    master   143m   v1.19.0+d59ce34   10.0.178.236   <none>        Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
ip-10-0-221-178.ap-southeast-1.compute.internal   Ready    master   143m   v1.19.0+d59ce34   10.0.221.178   <none>        Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
 
$ oc debug node/<WORKER_NODE>
Starting pod/ip-10-0-157-96ap-southeast-1computeinternal-debug ...
To use host binaries, run `chroot /host`
 
sh-4.4# chroot /host
  1. 查看kubelet服务状态。
sh-4.4# systemctl status kubelet
● kubelet.service - MCO environment configuration
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-env.conf
   Active: active (running) since Wed 2020-12-09 10:34:43 UTC; 21h ago
  Process: 1614 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 1612 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 1616 (kubelet)
    Tasks: 46 (limit: 406641)
   Memory: 278.7M
      CPU: 1h 49min 16.947s
   CGroup: /system.slice/kubelet.service
           └─1616 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime->
 
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.074134    1616 exec.go:60] Exec probe response: ""
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.074157    1616 prober.go:133] Readiness probe for "sdn-rmvhr_openshift-sdn(c31b8a10-5cde-4a24-a02c-64fd71c1ddc1):sdn" succeeded
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.162142    1616 prober.go:166] Exec-Probe Pod: certified-operators-558f675d4f-pcnjt, Container: certified-operators, Command: [grpc_health_p>
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.175948    1616 prober.go:181] HTTP-Probe Host: http://10.128.2.28, Port: 8080, Path: /plugins/
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.175968    1616 prober.go:184] HTTP-Probe Headers: map[]
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.177536    1616 http.go:117] Non fatal body truncation for http://10.128.2.28:8080/plugins/, Response: {200 OK 200 HTTP/1.1 1 1 map[Accept-R>
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.177580    1616 http.go:128] Probe succeeded for http://10.128.2.28:8080/plugins/, Response: {200 OK 200 HTTP/1.1 1 1 map[Accept-Ranges:[byt>
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.177646    1616 prober.go:133] Liveness probe for "plugin-registry-579847b4bf-rpscf_codeready(6ebf4641-9ecd-4527-a850-5d3541a00e4c):che-plug>
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.218115    1616 exec.go:60] Exec probe response: "status: SERVING\n"
Dec 10 08:02:43 ip-10-0-157-96 hyperkube[1616]: I0910 08:02:43.218135    1616 prober.go:133] Readiness probe for "certified-operators-558f675d4f-pcnjt_openshift-marketplace(1efe7943-3092-4097-8b14-764b2>
  1. 查看完整的kubelet启动参数,确认它访问的是位于“/var/run/crio/crio.sock”的基于CRI-O的“container-runtime-endpoint”。
sh-4.4# systemctl status kubelet | grep crio
           └─1616 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=aws --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb9ab6f21487d70c0fda256729adc82845aa3b68f9b84be18892d3096999d055 --v=4
  1. 查看kubelet的配置文件。
sh-4.4# more /etc/kubernetes/kubelet.conf
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  x509:
    clientCAFile: /etc/kubernetes/kubelet-ca.crt
  anonymous:
    enabled: false
cgroupDriver: systemd
cgroupRoot: /
clusterDNS:
  - 172.30.0.10
clusterDomain: cluster.local
containerLogMaxSize: 50Mi
maxPods: 250
kubeAPIQPS: 50
kubeAPIBurst: 100
rotateCertificates: true
serializeImagePulls: false
staticPodPath: /etc/kubernetes/manifests
systemCgroups: /system.slice
systemReserved:
  cpu: 500m
  memory: 1Gi
  ephemeral-storage: 1Gi
featureGates:
  LegacyNodeRoleBehavior: false
  NodeDisruptionExclusion: true
  RotateKubeletServerCertificate: true
  SCTPSupport: true
  ServiceNodeExclusion: true
  SupportPodPidsLimit: true
serverTLSBootstrap: true

查看节点的CRI-O服务

  1. 查看crictl的配置,缺省crictl连接的是该节点本地的“unix:///var/run/crio/crio.sock”
sh-4.4# cat /etc/crictl.yaml
runtime-endpoint: unix:///var/run/crio/crio.sock
  1. 查看crio运行环境的配置文件中的“pids_limit”参数,确认缺省为1024。
sh-4.4# cat /etc/crio/crio.conf | grep -v "#"  | sed '/^$/d' |grep -i pids_limit
pids_limit = 1024
  1. 查看crio系统服务的配置文件。
sh-4.4# more /usr/lib/systemd/system/crio.service
[Unit]
Description=Open Container Initiative Daemon
Documentation=https://github.com/cri-o/cri-o
Requires=crio-wipe.service
After=network-online.target crio-wipe.service
 
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/crio
EnvironmentFile=-/etc/sysconfig/crio-metrics
EnvironmentFile=-/etc/sysconfig/crio-network
EnvironmentFile=-/etc/sysconfig/crio-storage
Environment=GOTRACEBACK=crash
ExecStart=/usr/bin/crio \
          $CRIO_STORAGE_OPTIONS \
          $CRIO_NETWORK_OPTIONS \
          $CRIO_METRICS_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
TasksMax=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
OOMScoreAdjust=-999
TimeoutStartSec=0
Restart=on-abnormal
 
[Install]
WantedBy=multi-user.target
  1. 查看crio服务的运行状态。从“MCO environment configuration”可以看出crio的配置是由OpenShift的MCO控制的。
sh-4.4# systemctl status crio
● crio.service - MCO environment configuration
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-mco-default-env.conf
   Active: active (running) since Wed 2020-12-09 10:34:43 UTC; 4h 37min ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 1569 (crio)
    Tasks: 38
   Memory: 7.1G
      CPU: 31min 11.903s
   CGroup: /system.slice/crio.service
           └─1569 /usr/bin/crio --enable-metrics=true --metrics-port=9537
 
Dec 09 14:49:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 14:49:44.015501015Z" level=info msg="Checking image status: quay.io/openshift-release-dev/ocp-v4.0-art-d>
Dec 09 14:49:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 14:49:44.016383913Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e66662827187986d>
Dec 09 14:54:44 ip-10-0-157-96 crio[1569]: time="2020-09-09 14:54:44.019052497Z" level=info msg="Checking image status: quay.io/openshift-release-dev/ocp-v4.0-art-d>
Dec 09 14:54:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 14:54:44.019951959Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e66662827187986d>
Dec 09 14:59:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 14:59:44.022730850Z" level=info msg="Checking image status: quay.io/openshift-release-dev/ocp-v4.0-art-d>
Dec 09 14:59:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 14:59:44.023704574Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e66662827187986d>
Dec 09 15:04:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 15:04:44.026452786Z" level=info msg="Checking image status: quay.io/openshift-release-dev/ocp-v4.0-art-d>
Dec 09 15:04:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 15:04:44.027387909Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e66662827187986d>
Dec 09 15:09:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 15:09:44.030071204Z" level=info msg="Checking image status: quay.io/openshift-release-dev/ocp-v4.0-art-d>
Dec 09 15:09:44 ip-10-0-157-96 crio[1569]: time="2020-12-09 15:09:44.030978428Z" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e66662827187986d>
lines 1-23/23 (END)
  1. 查看crio服务的日志。
sh-4.4# journalctl -u crio
-- Logs begin at Wed 2020-12-09 10:30:54 UTC, end at Wed 2020-12-09 15:55:36 UTC. --
Dec 09 10:34:43 ip-10-0-157-96 systemd[1]: Starting MCO environment configuration...
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.252899572Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID>
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.254475885Z" level=info msg="Using conmon executable: /usr/libexec/crio/conmon"
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.259087353Z" level=info msg="Conmon does not support the --sync option"
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.259613545Z" level=info msg="No seccomp profile specified, using the internal default"
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.259626109Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.260772632Z" level=info msg="Update default CNI network name to "
Dec 09 10:34:43 ip-10-0-157-96 systemd[1]: Started MCO environment configuration.
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-09-09 10:34:43.821452311Z" level=info msg="Checking image status: quay.io/openshift-release-dev/ocp-v4.0-art-d>
Dec 09 10:34:43 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:34:43.821962249Z" level=info msg="Image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb9ab6f>
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.638807987Z" level=info msg="Running pod sandbox: openshift-sdn/ovs-6rtwz/POD" id=c2d4e60d-2b1a->
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.645679421Z" level=info msg="Running pod sandbox: openshift-machine-config-operator/machine-conf>
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.655058843Z" level=info msg="Running pod sandbox: openshift-monitoring/node-exporter-jrxw6/POD" >
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.659700454Z" level=info msg="Running pod sandbox: openshift-image-registry/node-ca-lmbhb/POD" id>
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.665450167Z" level=info msg="Running pod sandbox: openshift-sdn/sdn-rmvhr/POD" id=bd1dd1e4-c2fc->
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.676454651Z" level=info msg="Running pod sandbox: openshift-multus/multus-79h7g/POD" id=22e1c686>
Dec 09 10:35:22 ip-10-0-157-96 crio[1569]: time="2020-12-09 10:35:22.683559219Z" level=info msg="Running pod sandbox: openshift-cluster-node-tuning-operator/tuned-h>
。。。
  1. 重新启动crio服务。在执行完以下命令后,会立即从“oc debug”中退出来,在crio启动完成后再次进入“oc debug”,可以查看crio的日志确认
sh-4.4# systemctl restart crio

查看conmon监控进程

  1. OpenShift集群的MachineConfig是以Demon运行在每个节点上的,执行以下命令查找openshift-machine-config-operator项目中的Pod名称。
sh-4.4# crictl pods --namespace openshift-machine-config-operator
POD ID              CREATED             STATE               NAME                          NAMESPACE                           ATTEMPT
ba978ea7afe83       18 hours ago        Ready               machine-config-daemon-8bblm   openshift-machine-config-operator   0
sh-4.4# MCD_POD_NAME=machine-config-daemon-8bblm
sh-4.4# MCD_POD_ID=ba978ea7afe83
sh-4.4# MCD_FULL_POD_ID=$(crictl inspectp $MCD_POD_ID | jq .status.id | cut -d "\"" -f 2)
  1. 根据MCD_FULL_POD_ID查找该Pod包含的Container。可以看到该Pod中有2个容器,名称分别是machine-config-daemon和oauth-proxy。
sh-4.4# crictl ps -p $MCD_FULL_POD_ID
CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                     ATTEMPT             POD ID
a925dcb856164       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:21c49efa4fd9a4c6747c32cc6b2b0f877694d3fa5b3d3f66230129e603b152f0   24 hours ago        Running             oauth-proxy              0                   ba978ea7afe83
cbfbb31e114ca       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3a8e6a396a9f399ed2493fe5c65ec8e8aecd0f83d45f162de63aef9c2d88400   24 hours ago        Running             machine-config-daemon    0                   ba978ea7afe83
  1. 查询系统中同时包括conmon和machine-config-daemon的进程。可以看到有3个“/usr/libexec/crio/conmon”进程,它们使用的参数分别对应了上一步的2个容器,另外一个进程对应的是Pod本身。
sh-4.4# ps -ef | grep $MCD_POD_NAME | grep conmon
root        2090       1  0 Dec09 ?        00:00:00 /usr/libexec/crio/conmon -s -c ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b -n k8s_POD_machine-config-daemon-8bblm_openshift-machine-config-operator_b1239d33-1d26-4172-9018-9d7d478f9dfe_0 -u ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b -r /usr/bin/runc -b /var/run/containers/storage/overlay-containers/ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b/userdata --persist-dir /var/lib/containers/storage/overlay-containers/ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b/userdata -p /var/run/containers/storage/overlay-containers/ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b/userdata/pidfile -P /var/run/containers/storage/overlay-containers/ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b/userdata/conmon-pidfile -l /var/log/pods/openshift-machine-config-operator_machine-config-daemon-8bblm_b1239d33-1d26-4172-9018-9d7d478f9dfe/ba978ea7afe836c1fd0189e6b9198a58dd931a6610dc8a0bc5084b3c489f634b.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level info --runtime-arg --root=/run/runc
root        2707       1  0 Dec09 ?        00:00:00 /usr/libexec/crio/conmon -s -c cbfbb31e114ca4789906666f97f51a641da3ea568d026cb7d5216a6d379bc731 -n k8s_machine-config-daemon_machine-config-daemon-8bblm_openshift-machine-config-operator_b1239d33-1d26-4172-9018-9d7d478f9dfe_0 -u cbfbb31e114ca4789906666f97f51a641da3ea568d026cb7d5216a6d379bc731 -r /usr/bin/runc -b /var/run/containers/storage/overlay-containers/cbfbb31e114ca4789906666f97f51a641da3ea568d026cb7d5216a6d379bc731/userdata --persist-dir /var/lib/containers/storage/overlay-containers/cbfbb31e114ca4789906666f97f51a641da3ea568d026cb7d5216a6d379bc731/userdata -p /var/run/containers/storage/overlay-containers/cbfbb31e114ca4789906666f97f51a641da3ea568d026cb7d5216a6d379bc731/userdata/pidfile -P /var/run/containers/storage/overlay-containers/cbfbb31e114ca4789906666f97f51a641da3ea568d026cb7d5216a6d379bc731/userdata/conmon-pidfile -l /var/log/pods/openshift-machine-config-operator_machine-config-daemon-8bblm_b1239d33-1d26-4172-9018-9d7d478f9dfe/machine-config-daemon/0.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level info --runtime-arg --root=/run/runc
root        2990       1  0 Dec09 ?        00:00:00 /usr/libexec/crio/conmon -s -c a925dcb856164e6461ec3658a44b218348cb7d12ba36b323cd8196ac33be2993 -n k8s_oauth-proxy_machine-config-daemon-8bblm_openshift-machine-config-operator_b1239d33-1d26-4172-9018-9d7d478f9dfe_0 -u a925dcb856164e6461ec3658a44b218348cb7d12ba36b323cd8196ac33be2993 -r /usr/bin/runc -b /var/run/containers/storage/overlay-containers/a925dcb856164e6461ec3658a44b218348cb7d12ba36b323cd8196ac33be2993/userdata --persist-dir /var/lib/containers/storage/overlay-containers/a925dcb856164e6461ec3658a44b218348cb7d12ba36b323cd8196ac33be2993/userdata -p /var/run/containers/storage/overlay-containers/a925dcb856164e6461ec3658a44b218348cb7d12ba36b323cd8196ac33be2993/userdata/pidfile -P /var/run/containers/storage/overlay-containers/a925dcb856164e6461ec3658a44b218348cb7d12ba36b323cd8196ac33be2993/userdata/conmon-pidfile -l /var/log/pods/openshift-machine-config-operator_machine-config-daemon-8bblm_b1239d33-1d26-4172-9018-9d7d478f9dfe/oauth-proxy/0.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level info --runtime-arg --root=/run/runc

参考

https://cri-o.io/#container-images
https://www.redhat.com/en/blog/red-hat-openshift-container-platform-4-now-defaults-cri-o-underlying-container-engine
https://www.openshift.com/blog/crictl-vs-podman
https://kubernetes.io/zh/docs/tasks/debug-application-cluster/crictl/

Logo

开源、云原生的融合云平台

更多推荐