目录

1.宿主机环境

 2.安装docker

3.安装nvidia-docker(如果想要在docker容器中调用nvidia驱动必须要安装)

4.拉取镜像

4.1验证下-gpus选项

4.2运行利用GPU的Ubuntu容器

4.3写一个拉取镜像的脚本如下:

4.4运行脚本

5、安装CUDA

5.1CUDA推荐下载.run可以根据提示安装

5.2安装完成后,设置环境变量

6.cudnn的安装

6.1下载安装文件

6.2安装cudnn

6.3查看cudnn版本


1.宿主机环境

系统:ubuntu20.04

GPU驱动:nvidia-driver-418-server

CUDA版本:cuda10.1

CUDNN版本:cudnn7.6.4

 2.安装docker

安装命令如下:

curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
也可以使用国内 daocloud 一键安装命令:

curl -sSL https://get.daocloud.io/docker | sh

测试
docker run hello-world

3.安装nvidia-docker(如果想要在docker容器中调用nvidia驱动必须要安装)

 

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi

安装完成nvidia-docker之后会自动在/etc/docker下创建daemon.json文件,修改daemon.json文件入下:

lu@computer:~$ cat /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
   "default-runtime": "nvidia",
   "registry-mirrors": [
	    "https://kfwkfulq.mirror.aliyuncs.com",
	    "https://2lqq34jg.mirror.aliyuncs.com",
	    "https://pee6w651.mirror.aliyuncs.com",
	    "https://registry.docker-cn.com",
	    "https://hub-mirrot.c.163.com"
    ],
    "dns": ["8.8.8.8", "8.8.4.4"]
}
lu@computer:~$

修改完成之后重启docker服务:

sudo systemctl restart docker

4.拉取镜像

4.1验证下-gpus选项

$ docker run --help | grep -i gpus
      --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)

4.2运行利用GPU的Ubuntu容器

 $ docker run -it --rm --gpus all ubuntu nvidia-smi

故障排除
您是否遇到以下错误消息:

$ docker run -it --rm --gpus all ubuntu
docker: Error response from daemon: linux runtime spec devices: could not select device driver “” with capabilities: [[gpu]].

上述错误意味着Nvidia无法正确注册Docker。它实际上意味着驱动程序未正确安装在主机上。这也可能意味着安装了nvidia容器工具而无需重新启动docker守护程序:您需要重新启动docker守护程序。

建议回去验证是否安装了nvidia-container-runtime或者重新启动Docker守护进程。

安装nvidia-container-runtime:

$ apt-get install nvidia-container-runtime

4.3写一个拉取镜像的脚本如下:

如果上面步骤都验证通过,下面可以拉取镜像了

lu@computer:~/docker_home/ubuntu16.04_nvidia$ pwd
/home/lu/docker_home/ubuntu16.04_nvidia
lu@computer:~/docker_home/ubuntu16.04_nvidia$ 
lu@computer:~/docker_home/ubuntu16.04_nvidia$ cat run-ubuntu16.04_nvidia_docker.sh 
#/bin/bash

export MY_CONTAINER="ubuntu16.04_nvidia-`whoami`"
num=`sudo docker ps -a|grep -w "$MY_CONTAINER$"|wc -l`
echo $num
echo $MY_CONTAINER
if [ 0 -eq $num ];then
sudo xhost +
sudo docker run \
  -e DISPLAY=unix$DISPLAY --net=host --ipc=host --pid=host \
  -it --runtime=nvidia --privileged --name $MY_CONTAINER \
  -v $PWD:/home/share --gpus all ubuntu:16.04 bash
else
  sudo docker start $MY_CONTAINER
  sudo docker exec -ti $MY_CONTAINER /bin/bash
fi
lu@computer:~/docker_home/ubuntu16.04_nvidia$

4.4运行脚本

./run-ubuntu16.04_nvidia_docker.sh
#之后会自动拉取ubuntu16.04镜像和进入到容器
root@computer:/home/share# nvidia-smi 
Sun May 23 01:25:39 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.197.02   Driver Version: 418.197.02   CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 940M        Off  | 00000000:04:00.0 Off |                  N/A |
| N/A   57C    P0    N/A /  N/A |    374MiB /  2004MiB |     16%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       773      G   /usr/lib/xorg/Xorg                           109MiB |
|    0      1449      G   /usr/bin/gnome-shell                         111MiB |
|    0      1887      G   ...AAgAAAAAAAAACAAAAAAAAAA= --shared-files   150MiB |
+-----------------------------------------------------------------------------+
root@computer:/home/share# 

可以看到docker容器已经可以调用GPU驱动了

下面就是和宿主机一样安装cuda10.1和cudnn7.6.4了

5、安装CUDA

链接:https://developer.nvidia.com/cuda-toolkit-archive

5.1CUDA推荐下载.run可以根据提示安装

执行如下命令:

sudo bash cuda_10.0.130_410.48_linux.run

压住回车键,直到服务条款显示到100%。接着按下面的步骤选择:

accept

n(不要安装driver)

y

y

y

报错缺少libxml2

root@computer:/home/share# bash cuda_10.1.105_418.39_linux.run 
./cuda-installer: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
root@computer:/home/share# apt install libxml2

 

5.2安装完成后,设置环境变量

打开主目录下的 .bashrc文件添加如下路径,例如我的.bashrc文件在/home/lu/下。

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64
    export PATH=$PATH:/usr/local/cuda-10.0/bin
    export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-10.0

终端运行:source ~/.bashrc

检查:nvcc --version

 

6.cudnn的安装

6.1下载安装文件

按需求下载cudnn的安装文件:https://developer.nvidia.com/rdp/cudnn-archive

6.2安装cudnn

解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:

   sudo cp cuda/include/cudnn* /usr/local/cuda/include/
     
   sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
     
   sudo chmod a+r /usr/local/cuda/include/cudnn*
     
   sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

6.3查看cudnn版本

在终端输入

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

 

参考链接:

https://blog.csdn.net/BigData_Mining/article/details/104991349

https://blog.csdn.net/EasonCcc/article/details/108098930

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐