ubuntu20.04 k8s添加GPU节点
准备 nvidia 驱动和CUDA下载nvidia驱动和CUDAnvdia驱动网址https://www.nvidia.cn/Download/index.aspx?lang=cnCUDA网址https://developer.nvidia.com/cuda-toolkit-archive建立nvidia文件夹并拷贝sudo mkdir /worksudo chown -R casia:casia
准备 nvidia 驱动和CUDA
-
下载nvidia驱动和CUDA
nvdia驱动网址https://www.nvidia.cn/Download/index.aspx?lang=cn
CUDA网址https://developer.nvidia.com/cuda-toolkit-archive -
建立nvidia文件夹并拷贝
sudo mkdir /work
sudo chown -R casia:casia /work/
cd /work/
sudo apt-get update
sudo apt-get install -y gcc make python3-pip
mkdir nvidia
cd nvidia/
将下载好的nvidia驱动和CUDA拷贝到改文件夹 -
安装nvidia驱动和CUDA
sudo sh NVIDIA-Linux-x86_64-450.102.04.run 三次回车
sudo sh cuda_11.0.2_450.51.05_linux.run
键入accept回车->选择Install回车 -
检验
nvidia-smi
4 安装 nvidia-docker
在使用带有 cuda 环境的 docker 容器之前,首先需要安装 nvidia-docker 组件
安装docker
sudo apt-get update
sudo apt-get install -y
apt-transport-https
ca-certificates
curl
gnupg-agent
software-properties-common
curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository
“deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/
$(lsb_release -cs)
stable”
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
sudo gpasswd -a ${USER} docker
sudo service docker restart
添加 nvidia-docker 源
sudo apt-get install curl
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
4.3 安装 nvidia-docker2
安装 nvidia-docker2 后重启 docker 使得 nvidia-docker2 生效。
$ sudo apt-get install -y nvidia-docker2 vim
$ sudo systemctl restart docker
配置nvidia-docker
修改/etc/docker/daemon.json文件配置如下
sudo vim /etc/docker/daemon.json
{
“default-runtime”: “nvidia”,
“runtimes”: {
“nvidia”: {
“path”: “nvidia-container-runtime”,
“runtimeArgs”: []
}
}
}
sudo systemctl daemon-reload
sudo systemctl restart docker
关闭swap
sudo swapoff -a
vim /etc/fstab
注释掉/swapfile
一行
安装k8s
sudo apt-get update && apt-get install -y apt-transport-https
sudo curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
# 查看版本
apt-cache madison kubelet
sudo apt-get install -y kubelet=1.20.5-00 kubeadm=1.20.5-00 kubectl=1.20.5-00
sudo systemctl enable kubelet
sudo systemctl start kubelet
编辑docker daemon.json
vim /etc/docker/daemon.json
# 添加 "exec-opts": ["native.cgroupdriver=systemd"]
重启docker
systemctl restart docker
添加节点
master上执行
获取token
kubeadm token list
kubeadm token create --print-join-command
# kubeadm join 192.168.1.2:6443 --token 6yex72.30fxcz9l7ps0zuap --discovery-token-ca-cert-hash sha256:7b97fc66dad88395c35f0164e3c8dcb172476494043381fe5b35acd697f5ad1
worker上执行
sudo kubeadm join 192.168.1.2:6443 --token 6yex72.30fxcz9l7ps0zuap --discovery-token-ca-cert-hash sha256:7b97fc66dad88395c35f064e3c8dcb172476b494043381fe5b35acd697f5ad1
安装nfs
sudo apt install -y nfs-common
更多推荐
所有评论(0)