今天运行程序的过程中服务器突然终端,重新连接服务器,连接容器时出现报错:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

百度时发现有人说是docker服务没开 开启docker服务
于是按照高赞答案开启docker,仍旧没用,报错:

Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

再去百度,说上面提示是看日志中的报错信息,在一个答案中看到有好多类似的出错信息 开启docker报错
不过因为在里面没找到自己的,就只能自己在日志里找了:
在这里插入图片描述
上面图片看的不清楚,文字粘贴如下:

whut@whut-Precision-T7600:~$ systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─override.conf
   Active: failed (Result: start-limit-hit) since 一 2019-01-21 19:56:02 CST; 58s ago
     Docs: https://docs.docker.com
  Process: 3886 ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime (code=exited, status=1/FAILURE)
 Main PID: 3886 (code=exited, status=1/FAILURE)

1月 21 19:56:02 whut-Precision-T7600 systemd[1]: Failed to start Docker Application Container Engine.
1月 21 19:56:02 whut-Precision-T7600 systemd[1]: docker.service: Unit entered failed state.
1月 21 19:56:02 whut-Precision-T7600 systemd[1]: docker.service: Failed with result 'start-limit-hit'.
1月 21 19:56:28 whut-Precision-T7600 systemd[1]: docker.service: Start request repeated too quickly.
1月 21 19:56:28 whut-Precision-T7600 systemd[1]: Failed to start Docker Application Container Engine.
1月 21 19:56:28 whut-Precision-T7600 systemd[1]: docker.service: Failed with result 'start-limit-hit'.
1月 21 19:56:48 whut-Precision-T7600 systemd[1]: Stopped Docker Application Container Engine.
1月 21 19:56:48 whut-Precision-T7600 systemd[1]: docker.service: Start request repeated too quickly.
1月 21 19:56:48 whut-Precision-T7600 systemd[1]: Failed to start Docker Application Container Engine.
1月 21 19:56:48 whut-Precision-T7600 systemd[1]: docker.service: Failed with result 'start-limit-hit'.

上面报错信息中有一个override.conf文件,错误貌似出现在这里,于是循着这个找到一个和我很类似的问题 https://github.com/NVIDIA/nvidia-docker/issues/761
按照里面的提示,先查看了/etc/systemd/system/docker.service.d/override.conf 文件里的信息:

whut@whut-Precision-T7600:~$ cat /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

然后用dpkg查看相关信息(这一步不是很懂):

whut@whut-Precision-T7600:~$ dpkg -S /etc/systemd/system/docker.service.d/override.conf
dpkg-query: 没有找到与 /etc/systemd/system/docker.service.d/override.conf 相匹配的路径

之后按照有个答案的提示使用 systemctl edit docker 命令进入到前面查看的那个override.conf文件中,删除其中的 --add-runtime=nvidia=/usr/bin/nvidia-container-runtime就可以了,当然了,上述命令进入的是gun nano编辑器中,修改完注意保存退出,可参考 gun nano使用

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐