服务器重启后,输入nvidia-smi,报错如下:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

输入nvcc -V输入如下:

k8s@master:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

解决方法:

  • sudo apt-get install dkms

  • ll /usr/src/ 查看nvidia版本(最后一行的nvidia-410.48)

    k8s@master:~$ ll /usr/src/
    总用量 36
    drwxr-xr-x  9 root root 4096 Dec 14 06:40 ./
    drwxr-xr-x 12 root root 4096 Dec 27 15:46 ../
    drwxr-xr-x 27 root root 4096 Feb 26  2019 linux-headers-4.15.0-45/
    drwxr-xr-x  8 root root 4096 Feb 26  2019 linux-headers-4.15.0-45-generic/
    drwxr-xr-x 27 root root 4096 Apr  3  2019 linux-headers-4.15.0-47/
    drwxr-xr-x  8 root root 4096 Apr  3  2019 linux-headers-4.15.0-47-generic/
    drwxr-xr-x 25 root root 4096 Dec 13 06:15 linux-headers-4.15.0-72/
    drwxr-xr-x  8 root root 4096 Dec 13 06:15 linux-headers-4.15.0-72-generic/
    drwxr-xr-x  7 root root 4096 Feb 26  2019 nvidia-410.48/
    
  • sudo dkms install -m nvidia -v 410.48(-v后面的参数根据自己的nvidia的版本决定)

  • 到此,该问题已解决输入nvidia-smi即可得到如下输出:

k8s@master:~$ nvidia-smi
Sun Jan  5 21:10:18 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   33C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:0C:00.0 Off |                    0 |
| N/A   25C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   30C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:8B:00.0 Off |                    0 |
| N/A   25C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐