Terraform 升級 AWS EKS版本
如果遇到image无法拉取,因为k8s 的镜像仓库同一个IP 拉取是有上限次数的,如果每个pod 都在拉取就会到达上限,所以建议先把镜像先拉取到本地的镜像仓库,在从自己的镜像仓库拉取镜像。通过terraform创建新的nodegroup(这里用两个nodegroup 举例,多个nodegroup 自行添加),下面附带自己写的model。git branch -b hugo-20231114#创建分
检查需要升级的EKS 集群内的组件
升级calico组建,从3.24.5——》3.26.3
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/calico-vxlan.yaml kubectl -n kube-system set env daemonset/calico-node FELIX_AWSSRCDSTCHECK=Disable kubectl delete daemonset -n kube-system aws-node
滚动升级kong
保留kong的yaml文件
mkdir 2024110309-eks-upgrade
cd 20240309-eks-upgrade
helm get values kong > helm-default-kong-before.yaml
cat helm-default-kong-before.yaml
测试运行kong组件
helm mapkubeapis kong --namespace default --dry-run
运行kong组件到default命名空间
helm mapkubeapis kong --namespace default
helm upgrade kong kong/kong --version 2.29.0 --values helm-default-kong-before.yaml
kubectl scale -n default deployment kong-kong --replicas=3
cluster-autoscaler
https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
保留当前cluster-autoscaler的yaml文件
kubectl -n kube-system get deployment/cluster-autoscaler -o yaml > cluster-autoscaler.yaml
下载autoscaler.yaml
wget https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
修改cluster-autoscaler-autodiscover.yaml
vim cluster-autoscaler-autodiscover.yaml
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/dev-core-eks
启动pod
kubectl apply -f cluster-autoscaler-autodiscover.yaml
upgrade metric server (升级 EKS 监控组件)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
upgrade ebs-csi (升级 EKS 卷组件)
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.24"
测试pvc是否加密
cat > pvctest.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-claim
spec:
accessModes:
- ReadWriteOnce
storageClassName: ebs-sc
resources:
requests:
storage: 4Gi
EOF
kubectl apply -f pvctest.yaml
AWS控制台,点击升级所需要的集群,升级完需要注意kube-proxy的版本,查看现在最新的版本:Working with the Kubernetes kube-proxy add-on - Amazon EKS
查看当前版本;
kubectl describe daemonset kube-proxy -n kube-system | grep Image
Image: xxxx.amazonaws.com/eks/kube-proxy:v1.18.8-eksbuild.1
更新成最新的kube-proxy版本;
kubectl set image daemonset.apps/kube-proxy -n kube-system kube-proxy=xxxx.amazonaws.com/eks/kube-proxy:v1.27.6-minimal-eksbuild.2查看修改是否成功;
kubectl describe daemonset kube-proxy -n kube-system | grep Image | cut -d ":" -f 3
如果遇到image无法拉取,因为k8s 的镜像仓库同一个IP 拉取是有上限次数的,如果每个pod 都在拉取就会到达上限,所以建议先把镜像先拉取到本地的镜像仓库,在从自己的镜像仓库拉取镜像
# update kong (2.14.0 helm chart) images
kubectl set image --namespace default deployment.apps/kong-kong proxy=xxxx.amazonaws.com/eks-backup:kong-3.1kubectl set image --namespace default deployment.apps/kong-kong clear-stale-pid=xxxx.amazonaws.com/eks-backup:kong-3.1
# update calico-node(3.26.3) images
kubectl set image ds/calico-node upgrade-ipam=xxxx.amazonaws.com/eks-backup:cni-v3.26.3 -n kube-system
kubectl set image ds/calico-node install-cni=xxxx.amazonaws.com/eks-backup:cni-v3.26.3 -n kube-system
kubectl set image ds/calico-node calico-node=xxxx.amazonaws.com/eks-backup:node-v3.26.3 -n kube-system
kubectl set image ds/calico-node mount-bpffs=xxxx.amazonaws.com/eks-backup:node-v3.26.3 -n kube-system
kubectl set image deployment.apps/calico-kube-controllers calico-kube-controllers=xxxx.amazonaws.com/eks-backup:kube-controllers-3.26.3 -n kube-system
upgrade kubenertes dashboard (升级kubenertes dashborad 如果大家没有安装,这里是不用进行升级的)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v3.0.0-alpha0/charts/kubernetes-dashboard.yaml
wget https://raw.githubusercontent.com/kubernetes/dashboard/v3.0.0-alpha0/charts/kubernetes-dashboard.yamlvim kubernetes-dashboard.yaml
248a249 add
> kubernetes.io/ingress.class: kong
255c256 change
< - localhost
---
> - devops-kubernetes-dashboard.local
258c259 change
< - host: localhost
---
> tls:
- hosts:
- localhost
secretName: xxx-tls-20250417
rules:
- host: localhostkubectl apply -f kubernetes-dashboard.yaml
Terraform 创建 EKS 集群
eks_clusters = {
prod-core-eks = {
environment = "prod"
name = "prod-core-eks"
role_arn_key = "iam-cluster-role"
vpc_config = {
endpoint_private_access = true
endpoint_public_access = false
public_access_cidrs = null
security_groups_ids_keys = ["xxx"]
subnet_ids_keys = ["xxx-a", "xxx-b", "xxx-c"]
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
encryption_config = {
provider = {
key_arn_key = "prod-core-eks"
}
resource = ["secrets"]
}
kubernetes_network_config = {
service_ipv4_cidr = null
}
tags = {}
version = "1.28"
}
}
Terraform model eks cluster写法
resource "aws_eks_cluster" "eks_clusters" {
for_each = var.eks_clusters
name = each.value.name
role_arn = aws_iam_role.iam_roles[each.value.role_arn_key].arn
vpc_config {
endpoint_private_access = each.value.vpc_config.endpoint_private_access
endpoint_public_access = each.value.vpc_config.endpoint_public_access
public_access_cidrs = each.value.vpc_config.public_access_cidrs
security_group_ids = matchkeys(local.security_group_ids, local.security_group_keys, each.value.vpc_config.security_groups_ids_keys)
subnet_ids = matchkeys(local.subnet_ids, local.subnet_keys, each.value.vpc_config.subnet_ids_keys)
}
enabled_cluster_log_types = each.value.enabled_cluster_log_types
dynamic "encryption_config" {
for_each = each.value.encryption_config.provider.key_arn_key != null ? { "encryption_config" = each.value } : {}
content {
provider {
key_arn = aws_kms_key.kms_keys[each.value.encryption_config.provider.key_arn_key].arn
}
resources = each.value.encryption_config.resource
}
}
dynamic "kubernetes_network_config" {
for_each = each.value.kubernetes_network_config.service_ipv4_cidr != null ? { "kubernetes_network_config" = each.value } : {}
content {
service_ipv4_cidr = each.value.kubernetes_network_config.service_ipv4_cidr
}
}
tags = merge(
{
Name = each.key
environment = each.value.environment
Environment = each.value.environment
},
each.value.tags
)
version = each.value.version
lifecycle {
ignore_changes = [
vpc_config[0].security_group_ids
]
}
}
通过terraform创建新的nodegroup(这里用两个nodegroup 举例,多个nodegroup 自行添加),下面附带自己写的model
launch_templates = {
eks-node-group-01 = {
environment = "prod"
name = "eks-node-group-01"
description = "Managed by Terraform"
key_name_key = "sta-core-eks"
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
delete_on_termination = null
encrypted = false
iops = null
kms_key_id_key = null
throughput = null
volume_type = "gp3"
volume_size = 100
}
}
tags = {
Environment = "prod"
}
update_default_version = true
vpc_security_group_ids_keys = ["core-smg-linux", "eks-node-group"]
}
eks-node-group-02 = {
environment = "sta"
name = "eks-node-group-02"
description = "Managed by Terraform"
key_name_key = "sta-core-eks"
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
delete_on_termination = null
encrypted = false
iops = null
kms_key_id_key = null
throughput = null
volume_type = "gp3"
volume_size = 100
}
}
tags = {
Environment = "sta"
}
update_default_version = true
vpc_security_group_ids_keys = ["sta-core-smg-linux", "eks-node-group"]
}
}
Terraform model eks nodegroup写法
resource "aws_eks_node_group" "eks_node_groups" {
for_each = var.eks_node_groups
cluster_name = aws_eks_cluster.eks_clusters[each.value.cluster_name_key].name
node_group_name = each.value.node_group_name
node_role_arn = aws_iam_role.iam_roles[each.value.node_role_arn_key].arn
scaling_config {
desired_size = each.value.scaling_config.desired_size
max_size = each.value.scaling_config.max_size
min_size = each.value.scaling_config.min_size
}
subnet_ids = matchkeys(local.subnet_ids, local.subnet_keys, each.value.subnet_ids_keys)
ami_type = each.value.ami_type
capacity_type = each.value.capacity_type
disk_size = each.value.disk_size
force_update_version = each.value.force_update_version
instance_types = each.value.instance_types
labels = each.value.labels
# This piece of code works if all node_groups are with taint, but does not work if apply to particular node_group only
# dynamic "taint" {
# for_each = each.value.taint
# content {
# key = each.value.taint.key
# value = each.value.taint.value
# effect = each.value.taint.effect
# }
# }
dynamic "launch_template" {
for_each = each.value.launch_template.name_key != null ? { "launch_template" = each.value } : {}
content {
id = aws_launch_template.launch_templates[each.value.launch_template.name_key].id
version = each.value.version == null ? aws_launch_template.launch_templates[each.value.launch_template.name_key].latest_version : each.value.version
}
}
release_version = each.value.release_version
dynamic "remote_access" {
for_each = each.value.remote_access.ec2_ssh_key_key != null ? { "remote_access" = each.value } : {}
content {
ec2_ssh_key = aws_key_pair.key_pairs[each.value.remote_access.ec2_ssh_key_key].key_name
source_security_group_ids = matchkeys(local.security_group_ids, local.security_group_keys, each.value.remote_access.source_security_group_ids_keys)
}
}
tags = merge(
{
Name = each.key
environment = each.value.environment
#Environment = each.value.environment
},
each.value.tags
)
version = each.value.version
}
检查现有的node版本
查看node状态
kubectl get node
给旧的node打污点,驱逐旧的node节点上的pod
给旧node的node打污点
kubectl cordon nodes xxx.compute.internal给旧的node驱逐
kubectl drain ixxx.compute.internal --ignore-daemonsets --delete-emptydir-data
kubectl drain ip-172-30-16-{xxx,xxx,xxx,xxx,xxx}.xxx.compute.internal --ignore-daemonsets --delete-emptydir-data
删除旧的node节点
kubectl delete node ip-xxx.xxx.compute.internal
通过tf将旧nodegroup删除掉
git提交代码常用命令
git pull #拉取代码
git branch # 查看分支
git branch -b hugo-20231114 #创建分支
git checkout hugo-20231114 #切换分支
git add . #添加到本地仓库
git commit -m “xxxx” #描述
git push #推送代码到远程仓库
检查nodegroup状态
更多推荐
所有评论(0)