docker + nvidia-docker 深度学习框架(GPU)

1. Docker CE

环境:Ubuntu server 18.04

安装:docker CE

参考:ubuntu 18.04 安装docker ce

2. nvidia-docker

以下参考:NVIDIA Container Toolkit

Make sure you have installed the NVIDIA driver and Docker 19.03 for your Linux distribution Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed

1.1 CUDA Toolkit (非必须,可只安装nvidia driver)

以下参考:CUDA TOOLKIT DOCUMENTATION

Pre-installation Actions:

The NVIDIA CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads.

Choose the platform you are using and download the NVIDIA CUDA Toolkit

The CUDA Toolkit contains the CUDA driver and tools needed to create, build and run a CUDA application as well as libraries, header files, CUDA samples source code, and other resources.

Package Manager Installation:

  1. Perform the pre-installation actions.

  2. Install repository meta-data

    $ sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
    
  3. Installing the CUDA public GPG key

    When installing using the local repo:

    $ sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
    

    When installing using network repo on Ubuntu 18.04/18.10:

    $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
    

    When installing using network repo on Ubuntu 16.04:

    $ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
    
  4. Update the Apt repository cache

    $ sudo apt-get update
    
  5. Install CUDA

    $ sudo apt-get install cuda
    
  6. Perform the post-installation actions.

可能会出现的问题

参考:Ubuntu18.04下搭建深度学习环境(tensorflow CPU GPU、Keras、Pytorch、Pycharm、Jupyter)

安装完显卡驱动后,系统需要重启加载驱动,注意如果按照上述流程进行驱动安装,那在重启系统时,会出现一个蓝色背景的界面 perform mok management :
(1)当进入蓝色背景的界面perform mok management 后,选择 enroll mok ,
(2)进入enroll mok 界面,选择 continue ,
(3)进入enroll the key 界面,选择 yes ,
(4)接下来输入你在安装驱动时输入的密码,
(5)之后会跳到蓝色背景的界面perform mok management 选择第一个 reboot

这样,重启后N卡驱动就加载了

2.2 nvidia driver

简洁安装方式:

ubuntu-drivers devices
sudo ubuntu-drivers autoinstall

重启系统

2.3 nvidia-docker v2

以下参考:NVIDIA Container Toolkit

Ubuntu 16.04/18.04, Debian Jessie/Stretch/Buster

# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

2.4 nvidia-container-runtime

以上版本的nvidia-docker还需安装nvidia-container-runtime

以下参考:nvidia-container-runtime

Installation:Ubuntu distributions

  1. Install the repository for your distribution by following the instructions here.

  2. Install the nvidia-container-runtime package:

    sudo apt-get install nvidia-container-runtime
    

Docker Engine setup

Do not follow this section if you installed the nvidia-docker2 package, it already registers the runtime.

To register the nvidia runtime, use the method below that is best suited to your environment.
You might need to merge the new argument with your existing configuration.

Systemd drop-in file

sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

Daemon configuration file

sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF
sudo pkill -SIGHUP dockerd

You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:

"default-runtime": "nvidia"

Command line

sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]

3. TensorFlow docker

以下参考:【官方】TensorFlow/install/Docker

注意nvidia-docker v1 使用 nvidia-docker 别名,而 v2 使用 docker --runtime=nvidia

使用最新的 TensorFlow GPU 映像在容器中启动 bash shell 会话:

docker run --runtime=nvidia -it tensorflow/tensorflow:latest-gpu bash

4. 参考

  1. ubuntu 18.04 安装docker ce
  2. NVIDIA Container Toolkit
  3. CUDA TOOLKIT DOCUMENTATION
  4. Ubuntu18.04下搭建深度学习环境(tensorflow CPU GPU、Keras、Pytorch、Pycharm、Jupyter)
  5. NVIDIA Container Toolkit
  6. nvidia-container-runtime
  7. 【官方】TensorFlow/install/Docker
Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐