docker 无法启动排查过程和处理方法
windows 10主机安装vbox,vbox安装centOS7,centOS7 安装 docker。docker 指定固定 IP,192.168.1.x,主机 IP 是 192.168.1.X1、windows 10主机自己无端重启;2、之后,换了一个新的网络环境,主机IP 为 192.168.0.Xdocker 无法启动。回到原来的 192.168.1.x 环境中进行恢复。下面是恢复的步骤:一
windows 10主机安装vbox,vbox安装centOS7,centOS7 安装 docker。
docker 指定固定 IP,192.168.1.x,主机 IP 是 192.168.1.X
1、windows 10主机自己无端重启;
2、之后,换了一个新的网络环境,主机IP 为 192.168.0.X
docker 无法启动。
回到原来的 192.168.1.x 环境中进行恢复。
下面是恢复的步骤:
一、centOS启动后,docker 无法启动,排查过程
- 1、执行
docker ps -a
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
- 2、执行
systemctl start docker
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
- 3、执行
systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2021-02-15 03:17:58 UTC; 6min ago
Docs: https://docs.docker.com
Process: 2701 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE)
Main PID: 2701 (code=exited, status=1/FAILURE)
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.862400156Z" level=warning msg="mountpoint for pids not found"
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.862532666Z" level=info msg="Loading containers: start."
Feb 15 03:17:57 localhost dockerd[2701]: ..time="2021-02-15T03:17:57.866933410Z" level=info msg="Firewalld running: false"
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.957136073Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172....P address"
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.993371676Z" level=info msg="Loading containers: done."
Feb 15 03:17:58 localhost dockerd[2701]: time="2021-02-15T03:17:58.007806591Z" level=fatal msg="Error creating cluster component: error while loading TLS C...yet valid"
Feb 15 03:17:58 localhost systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Feb 15 03:17:58 localhost systemd[1]: Failed to start Docker Application Container Engine.
Feb 15 03:17:58 localhost systemd[1]: Unit docker.service entered failed state.
Feb 15 03:17:58 localhost systemd[1]: docker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
上面有一个 Error creating cluster component: error while loading TLS C...yet valid
折行了。
- 4、执行
systemctl status docker.service -l
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2021-02-15 03:17:58 UTC; 11min ago
Docs: https://docs.docker.com
Process: 2701 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE)
Main PID: 2701 (code=exited, status=1/FAILURE)
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.862400156Z" level=warning msg="mountpoint for pids not found"
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.862532666Z" level=info msg="Loading containers: start."
Feb 15 03:17:57 localhost dockerd[2701]: ..time="2021-02-15T03:17:57.866933410Z" level=info msg="Firewalld running: false"
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.957136073Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Feb 15 03:17:57 localhost dockerd[2701]: time="2021-02-15T03:17:57.993371676Z" level=info msg="Loading containers: done."
Feb 15 03:17:58 localhost dockerd[2701]: time="2021-02-15T03:17:58.007806591Z" level=fatal msg="Error creating cluster component: error while loading TLS Certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: x509: certificate has expired or is not yet valid"
Feb 15 03:17:58 localhost systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Feb 15 03:17:58 localhost systemd[1]: Failed to start Docker Application Container Engine.
Feb 15 03:17:58 localhost systemd[1]: Unit docker.service entered failed state.
Feb 15 03:17:58 localhost systemd[1]: docker.service failed.
可以看到 Error creating cluster component: error while loading TLS Certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: x509
- 5、执行
sudo dockerd --debug
有很多信息,包括DEBUG、INFO,最后的是一行 FATA 信息。
...
DEBU[0001] /sbin/iptables, [--wait -I DOCKER-ISOLATION -i docker0 -o docker_gwbridge -j DROP]
DEBU[0001] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION -i docker_gwbridge -o docker0 -j DROP]
DEBU[0001] /sbin/iptables, [--wait -I DOCKER-ISOLATION -i docker_gwbridge -o docker0 -j DROP]
DEBU[0001] successfully loaded the Root CA: /var/lib/docker/swarm/certificates/swarm-root-ca.crt
FATA[0001] Error creating cluster component: error while loading TLS Certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: x509: certificate has expired or is not yet valid
二、处理过程
- 1、执行
cd /var/lib/docker/swarm/certificates/
- 2、执行
ls -lsa
drwxr-xr-x. 2 root root 4096 Feb 10 10:07 .
drwx------. 5 root root 90 Feb 11 02:40 ..
-rw-r--r--. 1 root root 1385 Nov 11 05:18 swarm-node.crt
-rw-------. 1 root root 227 Feb 10 10:07 .swarm-node.key
-rw-------. 1 root root 227 Nov 11 05:18 swarm-node.key
-rw-r--r--. 1 root root 595 Nov 11 05:18 swarm-root-ca.crt
-rw-------. 1 root root 227 Nov 11 05:18 swarm-root-ca.key
- 3、执行
mv .swarm-node.key bak.key
- 4、执行
mv swarm-node.crt swarm-node.bak
- 5、执行
systemctl start docker
执行后,直接返回提示符,没有出错! - 6、执行
systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-02-15 03:41:29 UTC; 10min ago
Docs: https://docs.docker.com
Main PID: 3240 (dockerd)
Memory: 19.8M
- 7、进入路径
cd /var/lib/docker/swarm/certificates/
bash: cd: /var/lib/docker/swarm/certificates/: No such file or directory
- 8、进入路径
cd /var/lib/docker/swarm/ && ls -las
0 drwx------ 2 root root 6 Feb 15 03:41 .
4 drwx--x--x. 10 root root 4096 Feb 15 03:41 ..
即:出现这种情况,解决的方法:删除 /var/lib/docker/swarm/certificates/ 就可以。
出现问题的可能:
1、宿主机断电重启;
2、宿主机换了IP网段。
更多推荐
所有评论(0)