K8S集群搭建笔记

K8S集群搭建笔记1. Master的创建及配置1.1 环境准备1.2 初始化主节点1.2.1 修改主节点配置信息1.2.2 初始化主节点 kubeadm init1.2.3 配置kubectl1.2.4 检查master配置是否成功1.2.5 kubeadm-init.log文件1.2.5 查看master的token信息1.2.6 重新创建token1.2.7 获得--discovery-to

Ｔｉｇｅｒ

1549人浏览 · 2021-02-02 09:50:39

Ｔｉｇｅｒ · 2021-02-02 09:50:39 发布

1. Master的创建及配置

1.1 环境准备

切换到root用户设置准备好各项环境。

安装docker

# 第一步卸载旧版
sudo apt-get remove docker docker-engine docker.io containerd runc
# 更新准备
sudo apt-get update
# 允许apt通过https使用repository安装软件包
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
# 添加Docker官方GPG key(采用阿里云版)
sudo curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
# 验证key的指纹
sudo apt-key fingerprint 0EBFCD88
# 添加稳定版repository
sudo add-apt-repository \
   "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
# 安装指定版本的docker ce(apt-cache madison docker-ce可以查看有哪些版本， apt-cache madison docker-ce，推荐17.03.0~ce-0~ubuntu-xenial)
sudo apt-get install docker-ce=17.03.0~ce-0~ubuntu-xenial
# 将非root用户加入docker组，以允许免sudo执行docker
sudo gpasswd -a 用户名 docker
# 重启服务并刷新docker组成员
sudo service docker restart
newgrp - docker

安装kubeadm

第1步：关闭防火墙和关闭swap

    ufw disable
    swapoff -a
    # 注释 swap 开头的行 避免开机启动
   vi /etc/fstab

第2步：配置软件源（注意，下面阿里云支持的是16.04 的 xenial）

apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat << EOF >/etc/apt/sources.list.d/kubernetes.list
	> deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
	> EOF

第3步：安装 kubeadm，kubelet，kubectl

apt-get update  
apt-get install -y kubelet kubeadm kubectl

第4步：设置 kubelet 自启动，并启动 kubelet

systemctl enable kubelet && systemctl start kubelet

1.2 初始化主节点

1.2.1 修改主节点配置信息

- (1) 先导出配置文件

kubeadm config print init-defaults --kubeconfig ClusterConfiguration > kubeadm.yml

- (2) 再修改配置文件kubeadm.yml的部分信息

apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  # 修改为主节点 IP
  advertiseAddress: 192.168.141.130
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kubernetes-master
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
# 国内不能访问 Google，修改为阿里云
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
# 修改版本号
kubernetesVersion: v1.14.1
networking:
  dnsDomain: cluster.local
  # 配置成 Calico 的默认网段
  podSubnet: "192.168.0.0/16"
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
# 开启 IPVS 模式
（以下适合 k8s 1.19前版本）
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
  SupportIPVSProxyMode: true
mode: ipvs

（以下适合 k8s 1.20之后版本）
...
kubeProxy:
  config:
    mode: ipvs
...

注意：kubernetes对于IPVS模式的启用(https://www.cnblogs.com/zhangsi-lzq/p/14279997.html)

在1.19版本之前,kubeadm部署方式启用ipvs模式时,初始化配置文件需要添加以下内容:

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
  SupportIPVSProxyMode: true
mode: ipvs

在1.20版本中,使用kubeadm进行集群初始化时,虽然可以正常部署,但是查看pod情况的时候可以看到kube-proxy无法运行成功,报错部分内容如下:

#查看日志信息
]# kubectl  logs kube-proxy-l9twb -n kube-system
F0114 12:58:34.042769       1 server.go:488] failed complete: unrecognized feature gate: SupportIPVSProxyMode
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc00000e001, 0xc0004b6000, 0x6e, 0xc0)
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x29b65c0, 0xc000000003, 0x0, 0x0, 0xc0003d8230, 0x28edbc9, 0x9, 0x1e8, 0x0)

删除configmap中kube-proxy相关内容

kube-proxy的配置文件是通过configmap方式挂载到容器中的,因此我们只需要对应修改configmap中的配置内容,就可以将无效字段删除
# kubectl get cm -n kube-system
NAME                                 DATA   AGE
coredns                              1      5h18m
extension-apiserver-authentication   6      5h18m
kube-proxy                           2      5h18m
kube-root-ca.crt                     1      5h18m
kubeadm-config                       2      5h18m
kubelet-config-1.20                  1      5h18m

# kubectl edit cm kube-proxy -n kube-system　　　　
#在编辑模式中找到以下字段,删除后保存退出
featureGates: 
  SupportIPVSProxyMode: true

然后将删除所有kube-proxy进行重启,查看pod运行情况

- (3) 拉取镜像

kubeadm config images pull --config kubeadm.yml

1.2.2 初始化主节点 kubeadm init

该命令指定了初始化时需要使用的配置文件，其中添加 --experimental-upload-certs 参数可以在后续执行加入节点时自动分发证书文件，追加的 tee kubeadm-init.log 用以输出日志，具体如下：

kubeadm init --config=kubeadm.yml --upload-certs | tee kubeadm-init.log

中途失败或是想修改配置可以使用 kubeadm reset 命令重置配置，再做kubeadm init初始化操作即可；各Slave节点也要reset并重新执行join
重新初始化时，注意先删除$HOME/.kube目录，否则会出现错误：Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)

成功后如下：

[init] Using Kubernetes version: v1.14.1
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.141.130]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master localhost] and IPs [192.168.141.130 127.0.0.1 ::1]
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master localhost] and IPs [192.168.141.130 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 20.003326 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in ConfigMap "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
2cd5b86c4905c54d68cc7dfecc2bf87195e9d5d90b4fff9832d9b22fc5e73f96
[mark-control-plane] Marking the node kubernetes-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node kubernetes-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

# 后面子节点加入需要如下命令
kubeadm join 192.168.141.130:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:cab7c86212535adde6b8d1c7415e81847715cfc8629bb1d270b601744d662515

1.2.3 配置kubectl

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

# 非 ROOT 用户执行
chown $(id -u):$(id -g) $HOME/.kube/config

1.2.4 检查master配置是否成功

kubectl get node

# 能够打印出节点信息即表示成功
NAME                STATUS     ROLES    AGE     VERSION
kubernetes-master   NotReady   master   8m40s   v1.14.1

1.2.5 kubeadm-init.log文件

kubeadm init指令执行后产生的log文件，可以查看后续slave节点添加时的token信息；

1.2.5 查看master的token信息

kubeadm token list
OKEN                    TTL  EXPIRES              USAGES           DESCRIPTION            EXTRA GROUPS
8ewj1p.9r9hcjoqgajrj4gi  23h  2018-06-12T02:51:28Z authentication,  The default bootstrap  system:
                                                   signing          token generated by     bootstrappers:
                                                                    'kubeadm init'.        kubeadm:
                                                                                           default-node-token

1.2.6 重新创建token

kubeadm token create
5didvk.d09sbcov8ph2amjw

1.2.7 获得–discovery-token-ca-cert-hash

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'
8cb2de97839780a412b93877f8507ad6c94f73add17d5d7058e91741c9d5ec78

2. Slave的配置及加入到集群

2.1 Slave节点的环境准备

参考master节点的环境准备，安装docker及kubeadm等；

2.2 加入slave节点kubeadm join

先在master节点获得token信息、discovery-token-ca-cert-hash信息，然后在slave节点，切换到root用户执行kubeadm join，格式参考如下，注意替换相应参数：

kubeadm join 192.168.141.130:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:cab7c86212535adde6b8d1c7415e81847715cfc8629bb1d270b601744d662515

3. 配置集群网络

容器网络是容器选择连接到其他容器、主机和外部网络的机制。容器的 runtime 提供了各种网络模式，CNI(Container Network Interface) 是一个标准的，通用的接口。在容器平台，Docker，Kubernetes，Mesos 容器网络解决方案 flannel，calico，weave。只要提供一个标准的接口，就能为同样满足该协议的所有容器平台提供网络功能，而 CNI 正是这样的一个标准接口协议。
在 Kubernetes 中，kubelet 可以在适当的时间调用它找到的插件，为通过 kubelet 启动的 pod进行自动的网络配置。

Kubernetes 中可选的 CNI 插件如下：

Flannel
Calico
Canal
Weave

3.1 查看pod状态

kubectl get pod -n kube-system -o wide

3.2 安装网络插件Calico

Calico 为容器和虚拟机提供了安全的网络连接解决方案，并经过了大规模生产验证（在公有云和跨数千个集群节点中），可与 Kubernetes，OpenShift，Docker，Mesos，DC / OS 和 OpenStack 集成。

Calico 还提供网络安全规则的动态实施。使用 Calico 的简单策略语言，您可以实现对容器，虚拟机工作负载和裸机主机端点之间通信的细粒度控制

3.2.1 安装集群网络插件 Calico

安装网络

# master是使用最新版本的calico插件，也可以指定具体版本号，注意版本兼容
kubectl apply -f https://docs.projectcalico.org/master/manifests/calico.yaml
# 观察网络插件是否启动成功
watch kubectl get pods --all-namespaces

3.2.2 查看网络插件 Calico 是否跑起来

kubectl get pods --all-namespaces

4. k8s pod运行状态的检查

4.1 查看当前pod

kubectl get pod -n kube-system -o wide

NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
calico-kube-controllers-bb49cbdfb-j4dt2   1/1     Running   0          41s   192.168.189.1   computer-11   <none>           <none>
calico-node-6qvzc                         1/1     Running   0          41s   192.168.1.9     computer-9    <none>           <none>
calico-node-hwnhr                         1/1     Running   0          41s   192.168.1.11    computer-11   <none>           <none>
calico-node-vtwzm                         1/1     Running   0          41s   192.168.1.10    computer-10   <none>           <none>
coredns-7f89b7bc75-tkmms                  1/1     Running   0          45m   192.168.198.2   computer-9    <none>           <none>
coredns-7f89b7bc75-vv96v                  1/1     Running   0          45m   192.168.198.1   computer-9    <none>           <none>
etcd-computer-9                           1/1     Running   0          45m   192.168.1.9     computer-9    <none>           <none>
kube-apiserver-computer-9                 1/1     Running   0          45m   192.168.1.9     computer-9    <none>           <none>
kube-controller-manager-computer-9        1/1     Running   0          45m   192.168.1.9     computer-9    <none>           <none>
kube-proxy-d872f                          1/1     Running   0          27m   192.168.1.10    computer-10   <none>           <none>
kube-proxy-ft7kl                          1/1     Running   0          45m   192.168.1.9     computer-9    <none>           <none>
kube-proxy-n9fnj                          1/1     Running   0          28m   192.168.1.11    computer-11   <none>           <none>
kube-scheduler-computer-9                 1/1     Running   0          45m   192.168.1.9     computer-9    <none>           <none>

4.2 查看pod的状态

kubectl describe pod calico-node-fkfgd -n kube-system

Name:                 calico-node-fkfgd
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 computer-9/192.168.1.9
Start Time:           Fri, 29 Jan 2021 16:48:52 +0800
Labels:               controller-revision-hash=74dc975d6d
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.1.9
IPs:
  IP:           192.168.1.9
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  docker://c8f04bef35ce1fa59d7d0feb98fe9908e838fec01dac72e5d92d2661f8f865f9
    Image:         docker.io/calico/cni:master
    Image ID:      docker-
    ........................

4.3 查看pod的运行log

kubectl logs calico-node-fkfgd -n kube-system

2021-02-01 02:24:57.882 [INFO][9] startup/startup.go 383: Early log level set to info
2021-02-01 02:24:57.882 [INFO][9] startup/startup.go 399: Using NODENAME environment for node name computer-9
2021-02-01 02:24:57.882 [INFO][9] startup/startup.go 411: Determined node name: computer-9
2021-02-01 02:24:57.882 [INFO][9] startup/startup.go 103: Starting node computer-9 with version v3.18.0-0.dev-102-ge0f7235846ba
2021-02-01 02:24:57.884 [INFO][9] startup/startup.go 443: Checking datastore connection
2021-02-01 02:25:27.885 [INFO][9] startup/startup.go 458: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
2021-02-01 02:25:58.886 [INFO][9] startup/startup.go 458: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout

参考链接：
https://blog.csdn.net/csdn_welearn/article/details/91419124
https://www.cnblogs.com/zhangsi-lzq/p/14279997.html

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub