1. nvidia driver 的安装和配置

1.1 在线安装方式(推荐)

1.1.1 查看支持的 nvidia driver 版本

ubuntu-drivers devices
hjw@hjw-pc:~$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FB9sv000017AAsd00002297bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-460-server - distro non-free recommended
driver   : nvidia-driver-460 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

== /sys/devices/pci0000:00/0000:00:1c.5/0000:52:00.0 ==
modalias : pci:v00008086d00002723sv00008086sd00000080bc02sc80i00
vendor   : Intel Corporation
manual_install: True
driver   : backport-iwlwifi-dkms - distro free

1.1.2 安装推荐支持的 nvidia driver 版本

sudo ubuntu-drivers autoinstall

1.1.3 安装指定支持的 nvidia driver 版本

sudo apt install nvidia-driver-460-server

1.1.4 卸载安装的 nvidia driver 版本

sudo apt remove nvidia*
sudo apt-get autoremove

1.2 离线包安装方式

1.2.1 禁用nouveau服务

  • 编辑配置文件
sudo vi /etc/modprobe.d/blacklist.conf

在 blacklist.conf 中添加以下内容

blacklist nouveau
options nouveau modeset=0
  • 更新系统配置
sudo update-initramfs -u
sudo reboot

1.2.2 下载显卡驱动离线安装包

官方下载地址

在这里插入图片描述

1.2.3 安装和配置离线安装包

  • 步骤一
chmod +x NVIDIA-Linux-x86_64-470.63.01.run
  • 步骤二
sudo ./NVIDIA-Linux-x86_64-470.63.01.run -no-x-check
  • 步骤三

截图1

请添加图片描述

截图2

请添加图片描述

截图3

请添加图片描述

截图4

请添加图片描述

截图5

请添加图片描述

  • 步骤四
reboot

1.2.4 验证安装

nvidia-smi

请添加图片描述

1.2.5 卸载驱动

sudo ./NVIDIA-Linux-x86_64-470.63.01.run --uninstall

2. cuda 的安装和配置

2.1 查看支持的 cuda 版本

nvidia-smi
hjw@hjw-pc:~$ nvidia-smi
Sat Mar 20 21:05:49 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T1000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   42C    P8     4W /  N/A |    232MiB /  3911MiB |      7%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2124      G   /usr/lib/xorg/Xorg                143MiB |
|    0   N/A  N/A      2785      G   /usr/bin/gnome-shell               52MiB |
|    0   N/A  N/A      3430      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      3795      G   /usr/lib/firefox/firefox            3MiB |
|    0   N/A  N/A      3975      G   /usr/lib/firefox/firefox           25MiB |
|    0   N/A  N/A      4062      G   /usr/lib/firefox/firefox            1MiB |
+-----------------------------------------------------------------------------+

此处显示 CUDA Version: 11.2

2.2 下载支持的 cuda 版本

cuda 官方下载地址

在这里插入图片描述

此处根据系统要求,下载 CUDA 11.2Linuxx86_64Ubuntu18.04runfile(local)

wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run

也可以直接使用下载工具下载: https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run

2.3 安装支持的 cuda 版本

sudo sh cuda_11.2.0_460.27.04_linux.run

acceptn(不要安装driver)yyy
在安装的过程中,不要安装driver,前面已经安装好了

2.4 设置运行 cuda 环境变量

sudo gedit ~/.bashrc
export PATH="/usr/local/cuda-11.2/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH"
source ~/.bashrc

2.5 验证 cuda 版本的安装

nvcc --version
hjw@hjw-pc:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

2.6 卸载安装的 cuda 版本

  • 执行卸载命令
cd /usr/local/cuda-11.2/bin
sudo ./cuda-uninstaller
sudo /usr/local/cuda-11.2/bin/cuda-uninstaller

强制删除(非推荐方式)

sudo rm -rf /usr/local/cuda-11.2
sudo rm -rf /usr/local/cuda
  • 删除环境变量
sudo gedit ~/.bashrc

export PATH="/usr/local/cuda-11.2/bin:$PATH"

export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH"

source ~/.bashrc

3. cudnn 的安装和配置

3.1 下载支持的 cudnn 版本

cudnn 官方下载地址

在这里插入图片描述

根据 cuda 版本 及其发布的时间,选择 cudnn v8.1.0cuDNN Library for Linux

3.2 安装支持的 cudnn 版本

sudo cp cuda/include/* /usr/local/cuda/include/
sudo cp cuda/lib64/* /usr/local/cuda/lib64/

3.3 验证 cudnn 版本的安装

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
hjw@hjw-pc:~$ cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

3.4 卸载安装的 cudnn 版本

sudo rm -rf /usr/local/cuda/include/cudnn*
sudo rm -rf /usr/local/cuda/lib64/libcudnn*

4. nvidia-docker 的安装和配置

4.1 nvidia-docker 安装的运行环境

Ubuntu 18.04 LTS、Docker version 20.10.5、docker-compose version 1.28.5

说明:docker + docker-compose 配置安装

4.2 安装官方的 nvidia-docker 版本

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

sudo apt-get install -y nvidia-docker2

sudo systemctl restart docker

4.3 验证 nvidia-docker 版本的安装

sudo docker run --rm --gpus all nvidia/cuda:11.2.0-base nvidia-smi
hjw@hjw-pc:~$ sudo docker run --rm --gpus all nvidia/cuda:11.2.0-base nvidia-smi
Unable to find image 'nvidia/cuda:11.2.0-base' locally
11.2.0-base: Pulling from nvidia/cuda
f22ccc0b8772: Pull complete 
3cf8fb62ba5f: Pull complete 
e80c964ece6a: Pull complete 
5d59c811e2af: Pull complete 
b4113a5e55be: Pull complete 
a192f484acd8: Pull complete 
Digest: sha256:218afa9c2002be9c4629406c07ae4daaf72a3d65eb3c5a5614d9d7110840a46e
Status: Downloaded newer image for nvidia/cuda:11.2.0-base
Sat Mar 20 13:25:47 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T1000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   43C    P8     5W /  N/A |    279MiB /  3911MiB |     13%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

配置文件验证核对,其中 “graph”: “/home/hjw/docker-home” 为 docker 存储路径

sudo gedit /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "graph": "/home/hjw/docker-home"
}
systemctl daemon-reload
systemctl restart docker.service

Reference

[1] cuda和cudnn博客安装方法

[2] nvidia-docker官方配置方案

[3] nvidia-docker博客安装方法

[4] cuda-toolkit-release-notes

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐