【K8s存储】CSI接口
全名Container Storage Interface,即容器存储接口。
全名Container Storage Interface,即容器存储接口
设计原理
CSI 插件体系的设计思想,是把 Kubernetes 里的一部分存储管理功能,从主干代码里剥离出来,做成了几个单独的组件。这些组件会通过 Watch API 监听 Kubernetes 里与存储相关的事件变化,比如 PVC 的创建,来执行具体的存储管理动作。
其中绿色部分即external components,即:Driver Registrar、External Provisioner 和 External Attacher,对应的正是从 Kubernetes 项目里面剥离出来的那部分存储管理功能。其还是由Kubernetes 社区来开发和维护。
Driver Registrar 组件:负责将插件注册到 Kubelet 里面(这可以类比为,将可执行文件放在插件目录下)。而在具体实现上,Driver Registrar 需要请求 CSI 插件的 Identity 服务来获取插件信息。
External Provisioner 组件:负责的是 Provision 供应阶段,即调用Externeal Persistent Storage的API来分配存储。在具体实现上,External Provisioner 监听(Watch)了 APIServer 里的 PVC 对象。当一个 PVC 被创建时,它就会调用 CSI Controller 的 CreateVolume 方法,为你创建对应 PV。
External Attacher 组件:负责的正是Attach 附加物阶段(即将存储绑定到Node节点上)。在具体实现上,它监听了 APIServer 里 VolumeAttachment 对象的变化。VolumeAttachment 对象是 Kubernetes 确认一个 Volume 可以进入“Attach 阶段”的重要标志,一旦出现了 VolumeAttachment 对象,External Attacher 就会调用 CSI Controller 服务的 ControllerPublish 方法,完成它所对应的 Volume 的 Attach 阶段
右侧灰色部分是需要开发人员编写代码来实现的CSI插件
CSI Identity组件:CSI Identity 服务,负责对外暴露这个插件本身的信息。以NFS-CSI插件为例,主要包含
//源码PATH: csi-driver-nfs/pkg/nfs/identityserver.go
GetPluginInfo()
//返回以下信息
GetPluginInfoResponse{
Name: ids.Driver.name,
VendorVersion: ids.Driver.version,
}
Probe()//用于健康检查,该插件是否正常工作,返回Ready
GetPluginCapabilities()//返回容量
//但NFS作为文件存储,并没有和块存储一样的容量大小限制,所以该接口没有具体实现
CSI Controller 组件:CSI Controller 服务,定义对 CSI Volume(对应 Kubernetes 里的 PV)的管理接口,比如:创建和删除 CSI Volume、对 CSI Volume 进行 Attach/Dettach(在 CSI 里,这个操作被叫作 Publish/Unpublish),以及对 CSI Volume 进行 Snapshot 等
//源码PATH: csi-driver-nfs/pkg/nfs/controllerserver.go
CreateVolume()//调用newNFSVolume()方法创建
DeleteVolume()
//由于文件存储不需要有一个挂载过程,所以以下两个方法CSI-NFS并没有实现,只有接口定义
ControllerPublishVolume()
ControllerUnpublishVolume()
//目前CSI-NFS也没有实现,只有接口定义
CreateSnapshot()
DeleteSnapshot()
上述这些接口是由External Provisioner 和 External Attacher。这两个 External Components,分别通过监听 PVC 和 VolumeAttachement 对象,来跟 Kubernetes 进行协作。
CSI Node组件:CSI Node 服务,包含了所有需要在宿主机上执行的操作
//源码PATH: csi-driver-nfs/pkg/nfs/nodeserver.go
NodePublishVolume()//通过前面创建的volumeID来mount volume进pod
NodeUnpublishVolume()//unmount the volume
//以下这三个接口未实现
NodeStageVolume()//将卷临时挂载到暂存路径
NodeUnstageVolume()//将卷从临时挂载的暂存路径上卸载
NodeExpandVolume()//扩展卷
NodeGetInfo()//获取节点ID
NodeGetCapabilities()//获取容量
NodeGetVolumeStats()//获取volume的状态
//返回以下信息
Usage: []*csi.VolumeUsage{
{
Unit: csi.VolumeUsage_BYTES,
Available: available,
Total: capacity,
Used: used,
},
{
Unit: csi.VolumeUsage_INODES,
Available: inodesFree,
Total: inodes,
Used: inodesUsed,
},
},
通常在Mount阶段(即kubelet调用CSI接口将volume挂载进pod内部)时,NodeStageVolume
方法和NodePublishVolume
会协同工作
Pod创建时挂载流程
流程如下:
- 用户创建了一个包含 PVC 的 Pod,该 PVC 要求使用动态存储卷;
- Scheduler 根据 Pod 配置、节点状态、PV 配置等信息,把 Pod 调度到一个合适的 Worker 节点上;
- PV 控制器 watch 到该 Pod 使用的 PVC 处于 Pending 状态,于是调用 Volume Plugin(in-tree)创建存储卷,并创建 PV 对象(out-of-tree 由 External Provisioner 来处理);
- AD 控制器发现 Pod 和 PVC 处于待挂接状态,于是调用 Volume Plugin 挂接存储设备到目标 Worker 节点上
- 在 Worker 节点上,Kubelet 中的 Volume Manager 等待存储设备挂接完成,并通过 Volume Plugin 将设备挂载到全局目录:/var/lib/kubelet/pods/[pod uid]/volumes/kubernetes.io~iscsi/[PV name];
- Kubelet 通过 Docker 启动 Pod 的 Containers,用 bind mount 方式将已挂载到本地全局目录的卷映射到容器中。
更详细的流程:
部署
rbac-csi-nfs.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: csi-nfs-controller-sa
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: csi-nfs-node-sa
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-external-provisioner-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotclasses", "volumesnapshots"]
verbs: ["get", "list", "watch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents/status"]
verbs: ["get", "update", "patch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["csinodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-csi-provisioner-binding
subjects:
- kind: ServiceAccount
name: csi-nfs-controller-sa
namespace: kube-system
roleRef:
kind: ClusterRole
name: nfs-external-provisioner-role
apiGroup: rbac.authorization.k8s.io
csi-nfs-driverinfo.yaml
---
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: nfs.csi.k8s.io
spec:
attachRequired: false
volumeLifecycleModes:
- Persistent
fsGroupPolicy: File
csi-nfs-controller.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: csi-nfs-controller
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: csi-nfs-controller
template:
metadata:
labels:
app: csi-nfs-controller
spec:
hostNetwork: true # controller also needs to mount nfs to create dir
dnsPolicy: ClusterFirstWithHostNet # available values: Default, ClusterFirstWithHostNet, ClusterFirst
serviceAccountName: csi-nfs-controller-sa
nodeSelector:
kubernetes.io/os: linux # add "kubernetes.io/role: master" to run controller on master node
priorityClassName: system-cluster-critical
securityContext:
seccompProfile:
type: RuntimeDefault
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/controlplane"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: csi-provisioner
image: feyico/csi-provisioner:v3.5.0
args:
- "-v=2"
- "--csi-address=$(ADDRESS)"
- "--leader-election"
- "--leader-election-namespace=kube-system"
- "--extra-create-metadata=true"
- "--timeout=1200s"
env:
- name: ADDRESS
value: /csi/csi.sock
volumeMounts:
- mountPath: /csi
name: socket-dir
resources:
limits:
memory: 400Mi
requests:
cpu: 10m
memory: 20Mi
- name: csi-snapshotter
image: feyico/csi-snapshotter:v6.2.2
args:
- "--v=2"
- "--csi-address=$(ADDRESS)"
- "--leader-election-namespace=kube-system"
- "--leader-election"
- "--timeout=1200s"
env:
- name: ADDRESS
value: /csi/csi.sock
imagePullPolicy: IfNotPresent
volumeMounts:
- name: socket-dir
mountPath: /csi
resources:
limits:
memory: 200Mi
requests:
cpu: 10m
memory: 20Mi
- name: liveness-probe
image: feyico/livenessprobe:v2.10.0
args:
- --csi-address=/csi/csi.sock
- --probe-timeout=3s
- --health-port=29652
- --v=2
volumeMounts:
- name: socket-dir
mountPath: /csi
resources:
limits:
memory: 100Mi
requests:
cpu: 10m
memory: 20Mi
- name: nfs
image: feyico/nfsplugin:canary
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
imagePullPolicy: IfNotPresent
args:
- "-v=5"
- "--nodeid=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"
env:
- name: NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
ports:
- containerPort: 29652
name: healthz
protocol: TCP
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 30
volumeMounts:
- name: pods-mount-dir
mountPath: /var/lib/kubelet/pods
mountPropagation: "Bidirectional"
- mountPath: /csi
name: socket-dir
resources:
limits:
memory: 200Mi
requests:
cpu: 10m
memory: 20Mi
volumes:
- name: pods-mount-dir
hostPath:
path: /var/lib/kubelet/pods
type: Directory
- name: socket-dir
emptyDir: {}
csi-nfs-node.yaml
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: csi-nfs-node
namespace: kube-system
spec:
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
selector:
matchLabels:
app: csi-nfs-node
template:
metadata:
labels:
app: csi-nfs-node
spec:
hostNetwork: true # original nfs connection would be broken without hostNetwork setting
dnsPolicy: ClusterFirstWithHostNet # available values: Default, ClusterFirstWithHostNet, ClusterFirst
serviceAccountName: csi-nfs-node-sa
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
nodeSelector:
kubernetes.io/os: linux
tolerations:
- operator: "Exists"
containers:
- name: liveness-probe
image: feyico/livenessprobe:v2.10.0
args:
- --csi-address=/csi/csi.sock
- --probe-timeout=3s
- --health-port=29653
- --v=2
volumeMounts:
- name: socket-dir
mountPath: /csi
resources:
limits:
memory: 100Mi
requests:
cpu: 10m
memory: 20Mi
- name: node-driver-registrar
image: feyico/csi-node-driver-registrar:v2.8.0
args:
- --v=2
- --csi-address=/csi/csi.sock
- --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
livenessProbe:
exec:
command:
- /csi-node-driver-registrar
- --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
- --mode=kubelet-registration-probe
initialDelaySeconds: 30
timeoutSeconds: 15
env:
- name: DRIVER_REG_SOCK_PATH
value: /var/lib/kubelet/plugins/csi-nfsplugin/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: registration-dir
mountPath: /registration
resources:
limits:
memory: 100Mi
requests:
cpu: 10m
memory: 20Mi
- name: nfs
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
image: feyico/nfsplugin:canary
args:
- "-v=5"
- "--nodeid=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"
env:
- name: NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
ports:
- containerPort: 29653
name: healthz
protocol: TCP
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 30
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: pods-mount-dir
mountPath: /var/lib/kubelet/pods
mountPropagation: "Bidirectional"
resources:
limits:
memory: 300Mi
requests:
cpu: 10m
memory: 20Mi
volumes:
- name: socket-dir
hostPath:
path: /var/lib/kubelet/plugins/csi-nfsplugin
type: DirectoryOrCreate
- name: pods-mount-dir
hostPath:
path: /var/lib/kubelet/pods
type: Directory
- hostPath:
path: /var/lib/kubelet/plugins_registry
type: Directory
name: registration-dir
分别执行以下命令
kubectl apply -f rbac-csi-nfs.yaml kubectl apply -f csi-nfs-driverinfo.yaml kubectl apply -f csi-nfs-controller.yaml kubectl apply -f csi-nfs-node.yaml
部署完毕之后运行情况如下:
使用
PV供给
动态
创建storage class
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-csi
provisioner: nfs.csi.k8s.io
parameters:
server: 127.0.0.1
share: /data/nfsshare
# csi.storage.k8s.io/provisioner-secret is only needed for providing mountOptions in DeleteVolume
# csi.storage.k8s.io/provisioner-secret-name: "mount-options"
# csi.storage.k8s.io/provisioner-secret-namespace: "default"
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.1
创建PVC
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc1-delete-dyn
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
storageClassName: nfs-csi
静态
创建PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nfs
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Delete
mountOptions:
- hard
- nfsvers=3
csi:
driver: nfs.csi.k8s.io
readOnly: false
volumeHandle: unique-volumeid # make sure it's a unique id in the cluster
volumeAttributes:
server: 127.0.0.1
share: /data/nfsshare
Deployment使用
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-deployment-nfs
spec:
accessModes:
- ReadWriteMany # In this example, multiple Pods consume the same PVC.
resources:
requests:
storage: 10Gi
storageClassName: nfs-csi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-nfs
spec:
replicas: 1
selector:
matchLabels:
name: deployment-nfs
template:
metadata:
name: deployment-nfs
labels:
name: deployment-nfs
spec:
nodeSelector:
"kubernetes.io/os": linux
containers:
- name: deployment-nfs
image: nginx:1.19.5
imagePullPolicy: IfNotPresent
command:
- "/bin/bash"
- "-c"
- set -euo pipefail; while true; do echo $(hostname) $(date) >> /mnt/nfs/outfile; sleep 1; done
volumeMounts:
- name: nfs
mountPath: "/mnt/nfs"
volumes:
- name: nfs
persistentVolumeClaim:
claimName: pvc-deployment-nfs
NFS-CSI生命周期
- Register过程:
以下由DaemonSet类型Pod的csi-nfs-node中node-driver-registrar容器执行
插件注册容器node-driver-registrar挂载kubelet plugins_registry的hostpath类型目录(path: /var/lib/kubelet/plugins_registry/)将sock文件(nfs.csi.k8s.io-reg.sock)放入其中。
同时挂载kubelet plugins的hostpath类型目录(path: /var/lib/kubelet/plugins),创建csi-nfsplugin目录并将插件的sock文件放入(csi-nfsplugin/csi.sock)后续所有对该插件的RPC调用都会使用该sock。
并启动RPC服务。External component Driver Registrar 利用 kubelet plugin watcher 特性watch指定的文件夹路径来自动检测到这个存储插件。然后通过调用identity rpc服务,获得driver的信息,并完成注册。
- Provision过程:
以下由Deployment类型Pod的csi-nfs-controller中csi-provisioner容器执行
External Provisioner,Provisioner 将会 watch apiServer 中 PVC 资源的创建,并且PVC 所指定的 storageClass 的 provisioner是我们上面启动的插件(即nfs.csi.k8s.io)。那么,External Provisioner 将会调用 插件的 controller.createVolume() 服务。其主要工作应该是通过NFS客户端连接NFS服务器划分目录、赋予权限,同时返回其网络挂载路径(即NFS服务器内该PV目录路径)。
- Attach过程:(NFS没有该过程,以块存储为例)
部署External Attacher。Attacher 将会监听 apiServer 中 VolumeAttachment 对象的变化。一旦出现新的VolumeAttachment,Attacher 会调用插件的 controller.ControllerPublish() 服务。其主要工作是调用相关存储后端的api,把相应的磁盘 attach 到声明使用此 PVC/PV 的 pod 所调度到的 node 上。挂载的目录:/var/lib/kubelet/pods/<Pod ID>/volumes/<storage provisioner>/<name>
- Mount过程:
以下由DaemonSet类型Pod的csi-nfs-node中nfs容器执行
mount 不可能在远程的container里完成,所以这个工作需要kubelet来做。kubelet 检测到需要执行 Mount 操作的时候,通过调用 pkg/volume/csi 包,调用 CSI Node 服务内NodePublishVolume接口,完成 volume 的 Mount 阶段,将远端的目录mount到本地服务器。然后调用 CRI 启动带有 volume 参数的container,把上阶段准备好的volume 通过映射到 container指定的目录。
- Umount过程:
以下由DaemonSet类型Pod的csi-nfs-node中nfs容器执行
kubelet调用CSI Node服务内NodeUnpublishVolume接口,完成umount工作
- Provision过程:
以下由Deployment类型Pod的csi-nfs-controller中csi-provisioner容器执行
csi-provisioner根据storageclass中定义的reclaimPolicy执行回收,这边是delete所以执行了删除远端PV的操作,即删除对应的目录
源码解析
Todo
参考资料:
更多推荐
所有评论(0)