nvidia driver、cuda、cudnn、nvidia-docker 安装、配置和部署(Ubuntu 18.04 LTS)
nvidia driver、cuda、cudnn、nvidia-docker 安装、配置和部署(Ubuntu 18.04 LTS)1. nvidia driver 的安装和配置1.1 查看支持的 nvidia driver 版本1.2 安装推荐支持的 nvidia driver 版本1.3 安装指定支持的 nvidia driver 版本1.4 卸载安装的 nvidia driver 版本2. c
nvidia driver、cuda、cudnn、nvidia-docker 安装、配置和部署(Ubuntu 18.04 LTS)
1. nvidia driver 的安装和配置
1.1 在线安装方式(推荐)
1.1.1 查看支持的 nvidia driver 版本
ubuntu-drivers devices
hjw@hjw-pc:~$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FB9sv000017AAsd00002297bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-460-server - distro non-free recommended
driver : nvidia-driver-460 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
== /sys/devices/pci0000:00/0000:00:1c.5/0000:52:00.0 ==
modalias : pci:v00008086d00002723sv00008086sd00000080bc02sc80i00
vendor : Intel Corporation
manual_install: True
driver : backport-iwlwifi-dkms - distro free
1.1.2 安装推荐支持的 nvidia driver 版本
sudo ubuntu-drivers autoinstall
1.1.3 安装指定支持的 nvidia driver 版本
sudo apt install nvidia-driver-460-server
1.1.4 卸载安装的 nvidia driver 版本
sudo apt remove nvidia*
sudo apt-get autoremove
1.2 离线包安装方式
1.2.1 禁用nouveau服务
- 编辑配置文件
sudo vi /etc/modprobe.d/blacklist.conf
在 blacklist.conf 中添加以下内容
blacklist nouveau
options nouveau modeset=0
- 更新系统配置
sudo update-initramfs -u
sudo reboot
1.2.2 下载显卡驱动离线安装包
1.2.3 安装和配置离线安装包
- 步骤一
chmod +x NVIDIA-Linux-x86_64-470.63.01.run
- 步骤二
sudo ./NVIDIA-Linux-x86_64-470.63.01.run -no-x-check
- 步骤三
截图1
截图2
截图3
截图4
截图5
- 步骤四
reboot
1.2.4 验证安装
nvidia-smi
1.2.5 卸载驱动
sudo ./NVIDIA-Linux-x86_64-470.63.01.run --uninstall
2. cuda 的安装和配置
2.1 查看支持的 cuda 版本
nvidia-smi
hjw@hjw-pc:~$ nvidia-smi
Sat Mar 20 21:05:49 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro T1000 Off | 00000000:01:00.0 Off | N/A |
| N/A 42C P8 4W / N/A | 232MiB / 3911MiB | 7% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2124 G /usr/lib/xorg/Xorg 143MiB |
| 0 N/A N/A 2785 G /usr/bin/gnome-shell 52MiB |
| 0 N/A N/A 3430 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 3795 G /usr/lib/firefox/firefox 3MiB |
| 0 N/A N/A 3975 G /usr/lib/firefox/firefox 25MiB |
| 0 N/A N/A 4062 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+
此处显示
CUDA Version: 11.2
2.2 下载支持的 cuda 版本
此处根据系统要求,下载 CUDA 11.2 、Linux、 x86_64、Ubuntu、18.04、runfile(local)
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
也可以直接使用下载工具下载: https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
2.3 安装支持的 cuda 版本
sudo sh cuda_11.2.0_460.27.04_linux.run
accept、n(不要安装driver)、y、y、y
在安装的过程中,不要安装driver,前面已经安装好了
2.4 设置运行 cuda 环境变量
sudo gedit ~/.bashrc
export PATH="/usr/local/cuda-11.2/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH"
source ~/.bashrc
2.5 验证 cuda 版本的安装
nvcc --version
hjw@hjw-pc:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
2.6 卸载安装的 cuda 版本
- 执行卸载命令
cd /usr/local/cuda-11.2/bin
sudo ./cuda-uninstaller
sudo /usr/local/cuda-11.2/bin/cuda-uninstaller
强制删除(非推荐方式)
sudo rm -rf /usr/local/cuda-11.2 sudo rm -rf /usr/local/cuda
- 删除环境变量
sudo gedit ~/.bashrc
export PATH="/usr/local/cuda-11.2/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH"
source ~/.bashrc
3. cudnn 的安装和配置
3.1 下载支持的 cudnn 版本
根据 cuda 版本 及其发布的时间,选择 cudnn v8.1.0,cuDNN Library for Linux
3.2 安装支持的 cudnn 版本
sudo cp cuda/include/* /usr/local/cuda/include/
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
3.3 验证 cudnn 版本的安装
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
hjw@hjw-pc:~$ cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#endif /* CUDNN_VERSION_H */
3.4 卸载安装的 cudnn 版本
sudo rm -rf /usr/local/cuda/include/cudnn*
sudo rm -rf /usr/local/cuda/lib64/libcudnn*
4. nvidia-docker 的安装和配置
4.1 nvidia-docker 安装的运行环境
Ubuntu 18.04 LTS、Docker version 20.10.5、docker-compose version 1.28.5
4.2 安装官方的 nvidia-docker 版本
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
4.3 验证 nvidia-docker 版本的安装
sudo docker run --rm --gpus all nvidia/cuda:11.2.0-base nvidia-smi
hjw@hjw-pc:~$ sudo docker run --rm --gpus all nvidia/cuda:11.2.0-base nvidia-smi
Unable to find image 'nvidia/cuda:11.2.0-base' locally
11.2.0-base: Pulling from nvidia/cuda
f22ccc0b8772: Pull complete
3cf8fb62ba5f: Pull complete
e80c964ece6a: Pull complete
5d59c811e2af: Pull complete
b4113a5e55be: Pull complete
a192f484acd8: Pull complete
Digest: sha256:218afa9c2002be9c4629406c07ae4daaf72a3d65eb3c5a5614d9d7110840a46e
Status: Downloaded newer image for nvidia/cuda:11.2.0-base
Sat Mar 20 13:25:47 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro T1000 Off | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 5W / N/A | 279MiB / 3911MiB | 13% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
配置文件验证核对,其中 “graph”: “/home/hjw/docker-home” 为 docker 存储路径
sudo gedit /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"graph": "/home/hjw/docker-home"
}
systemctl daemon-reload
systemctl restart docker.service
Reference
[1] cuda和cudnn博客安装方法
更多推荐
所有评论(0)