环境搭建：docker + nvidia-docker 深度学习框架（GPU）

docker + nvidia-docker 深度学习框架（GPU）文章目录docker + nvidia-docker 深度学习框架（GPU）1. Docker CE2. nvidia-docker1.1 CUDA Toolkit （非必须，可只安装nvidia driver）2.2 nvidia driver2.3 nvidia-docker v22.4 nvidia-container-r.

Letitia96

1838人浏览 · 2019-10-13 17:50:48

Letitia96 · 2019-10-13 17:50:48 发布

docker + nvidia-docker 深度学习框架（GPU）

文章目录

docker + nvidia-docker 深度学习框架（GPU）

1. Docker CE

环境：Ubuntu server 18.04

安装：docker CE

参考：ubuntu 18.04 安装docker ce

2. nvidia-docker

以下参考：NVIDIA Container Toolkit

Make sure you have installed the NVIDIA driver and Docker 19.03 for your Linux distribution Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed

1.1 CUDA Toolkit （非必须，可只安装nvidia driver）

以下参考：CUDA TOOLKIT DOCUMENTATION

Pre-installation Actions:

The NVIDIA CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads.

Choose the platform you are using and download the NVIDIA CUDA Toolkit

The CUDA Toolkit contains the CUDA driver and tools needed to create, build and run a CUDA application as well as libraries, header files, CUDA samples source code, and other resources.

Package Manager Installation:

Perform the pre-installation actions.
Install repository meta-data
$ sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
Installing the CUDA public GPG key

When installing using the local repo:
$ sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
When installing using network repo on Ubuntu 18.04/18.10:
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
When installing using network repo on Ubuntu 16.04:
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
Update the Apt repository cache
$ sudo apt-get update
Install CUDA
$ sudo apt-get install cuda
Perform the post-installation actions.

可能会出现的问题：

参考：Ubuntu18.04下搭建深度学习环境（tensorflow CPU GPU、Keras、Pytorch、Pycharm、Jupyter）

安装完显卡驱动后，系统需要重启加载驱动，注意如果按照上述流程进行驱动安装，那在重启系统时，会出现一个蓝色背景的界面 perform mok management ：
(1)当进入蓝色背景的界面perform mok management 后，选择 enroll mok ,
(2)进入enroll mok 界面，选择 continue ,
(3)进入enroll the key 界面，选择 yes ,
(4)接下来输入你在安装驱动时输入的密码，
(5)之后会跳到蓝色背景的界面perform mok management 选择第一个 reboot

这样，重启后N卡驱动就加载了

2.2 nvidia driver

简洁安装方式：

ubuntu-drivers devices
sudo ubuntu-drivers autoinstall

重启系统

2.3 nvidia-docker v2

以下参考：NVIDIA Container Toolkit

Ubuntu 16.04/18.04, Debian Jessie/Stretch/Buster

# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

2.4 nvidia-container-runtime

以上版本的nvidia-docker还需安装nvidia-container-runtime

以下参考：nvidia-container-runtime

Installation：Ubuntu distributions
Install the repository for your distribution by following the instructions here.
Install the nvidia-container-runtime package:
sudo apt-get install nvidia-container-runtime
Docker Engine setup：

Do not follow this section if you installed the nvidia-docker2 package, it already registers the runtime.

To register the nvidia runtime, use the method below that is best suited to your environment.
You might need to merge the new argument with your existing configuration.

Systemd drop-in file
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
Daemon configuration file
sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF
sudo pkill -SIGHUP dockerd
You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:
"default-runtime": "nvidia"
Command line
sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]

3. TensorFlow docker

以下参考：【官方】TensorFlow/install/Docker

注意：nvidia-docker v1 使用 nvidia-docker 别名，而 v2 使用 docker --runtime=nvidia。

使用最新的 TensorFlow GPU 映像在容器中启动 bash shell 会话：
docker run --runtime=nvidia -it tensorflow/tensorflow:latest-gpu bash

4. 参考

向您推荐>>Eolink开发者社区

权威｜前沿｜技术｜干货｜国内首个API全生命周期开发者社区

更多推荐

ELK实现containerd的容器日志采集展示【基于logging的全栈监测】

企业级ELK Stack构建介绍

云原生

深入理解 Mocha 测试框架：从零实现一个 Mocha

前言什么是自动化测试自动化测试在很多团队中都是Devops环节中很难执行起来的一个环节，主要原因在于测试代码的编写工作很难抽象，99%的场景都需要和业务强绑定，而且写测试代码的编写工作量往往比编写实际业务代码的工作量更多。在一些很多业务场景中投入产出比很低，适合写自动化测试的应该是那些中长期业务以及一些诸如组件一样的基础库。自动化测试是个比较大的概念，其中分类也比较多，比如单元测试，端对端测试，集

云原生

(20200916 Solved)docker-compose up创建容器自动退出

问题描述如题，创建容器后自动退出了。并且docker start container无效解决方案原因是缺失了控制终端的配置，需要在docker-compose.yml中增加tty:true ，有时候这样也不行，需要再增加一个command:/bin/bash，命令不一定是这个，需要是一个不会退出的命令，然后用-d后台启动容器。Referencesdocker-compose启动容器后自动退出...

云原生

所有评论(0)

查看更多评论

Letitia96

@Letitia96

已为社区贡献1条内容