全名Container Storage Interface,即容器存储接口

设计原理

CSI 插件体系的设计思想,是把 Kubernetes 里的一部分存储管理功能,从主干代码里剥离出来,做成了几个单独的组件。这些组件会通过 Watch API 监听 Kubernetes 里与存储相关的事件变化,比如 PVC 的创建,来执行具体的存储管理动作。

其中绿色部分即external components,即:Driver Registrar、External Provisioner 和 External Attacher,对应的正是从 Kubernetes 项目里面剥离出来的那部分存储管理功能。其还是由Kubernetes 社区来开发和维护。

Driver Registrar 组件:负责将插件注册到 Kubelet 里面(这可以类比为,将可执行文件放在插件目录下)。而在具体实现上,Driver Registrar 需要请求 CSI 插件的 Identity 服务来获取插件信息。

External Provisioner 组件:负责的是 Provision 供应阶段,即调用Externeal Persistent Storage的API来分配存储。在具体实现上,External Provisioner 监听(Watch)了 APIServer 里的 PVC 对象。当一个 PVC 被创建时,它就会调用 CSI Controller 的 CreateVolume 方法,为你创建对应 PV。

External Attacher 组件:负责的正是Attach 附加物阶段(即将存储绑定到Node节点上)。在具体实现上,它监听了 APIServer 里 VolumeAttachment 对象的变化。VolumeAttachment 对象是 Kubernetes 确认一个 Volume 可以进入“Attach 阶段”的重要标志,一旦出现了 VolumeAttachment 对象,External Attacher 就会调用 CSI Controller 服务的 ControllerPublish 方法,完成它所对应的 Volume 的 Attach 阶段

右侧灰色部分是需要开发人员编写代码来实现的CSI插件

CSI Identity组件:CSI Identity 服务,负责对外暴露这个插件本身的信息。以NFS-CSI插件为例,主要包含

//源码PATH: csi-driver-nfs/pkg/nfs/identityserver.go

GetPluginInfo()
//返回以下信息
GetPluginInfoResponse{
   Name:          ids.Driver.name,
   VendorVersion: ids.Driver.version,
}

Probe()//用于健康检查,该插件是否正常工作,返回Ready
GetPluginCapabilities()//返回容量
//但NFS作为文件存储,并没有和块存储一样的容量大小限制,所以该接口没有具体实现

CSI Controller 组件:CSI Controller 服务,定义对 CSI Volume(对应 Kubernetes 里的 PV)的管理接口,比如:创建和删除 CSI Volume、对 CSI Volume 进行 Attach/Dettach(在 CSI 里,这个操作被叫作 Publish/Unpublish),以及对 CSI Volume 进行 Snapshot 等

//源码PATH: csi-driver-nfs/pkg/nfs/controllerserver.go

CreateVolume()//调用newNFSVolume()方法创建
DeleteVolume()

//由于文件存储不需要有一个挂载过程,所以以下两个方法CSI-NFS并没有实现,只有接口定义
ControllerPublishVolume()
ControllerUnpublishVolume()

//目前CSI-NFS也没有实现,只有接口定义
CreateSnapshot()
DeleteSnapshot()

上述这些接口是由External Provisioner 和 External Attacher。这两个 External Components,分别通过监听 PVC 和 VolumeAttachement 对象,来跟 Kubernetes 进行协作。

CSI Node组件:CSI Node 服务,包含了所有需要在宿主机上执行的操作

//源码PATH: csi-driver-nfs/pkg/nfs/nodeserver.go

NodePublishVolume()//通过前面创建的volumeID来mount volume进pod
NodeUnpublishVolume()//unmount the volume

//以下这三个接口未实现
NodeStageVolume()//将卷临时挂载到暂存路径 
NodeUnstageVolume()//将卷从临时挂载的暂存路径上卸载
NodeExpandVolume()//扩展卷

NodeGetInfo()//获取节点ID
NodeGetCapabilities()//获取容量
NodeGetVolumeStats()//获取volume的状态
//返回以下信息
Usage: []*csi.VolumeUsage{
  {
       Unit:      csi.VolumeUsage_BYTES,
       Available: available,
       Total:     capacity,
       Used:      used,
  },
  {
       Unit:      csi.VolumeUsage_INODES,
       Available: inodesFree,
       Total:     inodes,
       Used:      inodesUsed,
  },
},

通常在Mount阶段(即kubelet调用CSI接口将volume挂载进pod内部)时,​​NodeStageVolume​​​方法和​​NodePublishVolume​​会协同工作

Pod创建时挂载流程

流程如下:

  1. 用户创建了一个包含 PVC 的 Pod,该 PVC 要求使用动态存储卷;
  2. Scheduler 根据 Pod 配置、节点状态、PV 配置等信息,把 Pod 调度到一个合适的 Worker 节点上;
  3. PV 控制器 watch 到该 Pod 使用的 PVC 处于 Pending 状态,于是调用 Volume Plugin(in-tree)创建存储卷,并创建 PV 对象(out-of-tree 由 External Provisioner 来处理);
  4. AD 控制器发现 Pod 和 PVC 处于待挂接状态,于是调用 Volume Plugin 挂接存储设备到目标 Worker 节点上
  5. 在 Worker 节点上,Kubelet 中的 Volume Manager 等待存储设备挂接完成,并通过 Volume Plugin 将设备挂载到全局目录:/var/lib/kubelet/pods/[pod uid]/volumes/kubernetes.io~iscsi/[PV name];
  6. Kubelet 通过 Docker 启动 Pod 的 Containers,用 bind mount 方式将已挂载到本地全局目录的卷映射到容器中。

更详细的流程:

部署

rbac-csi-nfs.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: csi-nfs-controller-sa
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: csi-nfs-node-sa
  namespace: kube-system
---

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-external-provisioner-role
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotclasses", "volumesnapshots"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotcontents"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotcontents/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["csinodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-csi-provisioner-binding
subjects:
  - kind: ServiceAccount
    name: csi-nfs-controller-sa
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: nfs-external-provisioner-role
  apiGroup: rbac.authorization.k8s.io

csi-nfs-driverinfo.yaml

---
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  name: nfs.csi.k8s.io
spec:
  attachRequired: false
  volumeLifecycleModes:
    - Persistent
  fsGroupPolicy: File

csi-nfs-controller.yaml

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: csi-nfs-controller
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: csi-nfs-controller
  template:
    metadata:
      labels:
        app: csi-nfs-controller
    spec:
      hostNetwork: true  # controller also needs to mount nfs to create dir
      dnsPolicy: ClusterFirstWithHostNet  # available values: Default, ClusterFirstWithHostNet, ClusterFirst
      serviceAccountName: csi-nfs-controller-sa
      nodeSelector:
        kubernetes.io/os: linux  # add "kubernetes.io/role: master" to run controller on master node
      priorityClassName: system-cluster-critical
      securityContext:
        seccompProfile:
          type: RuntimeDefault
      tolerations:
        - key: "node-role.kubernetes.io/master"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "node-role.kubernetes.io/controlplane"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "node-role.kubernetes.io/control-plane"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
        - name: csi-provisioner
          image: feyico/csi-provisioner:v3.5.0
          args:
            - "-v=2"
            - "--csi-address=$(ADDRESS)"
            - "--leader-election"
            - "--leader-election-namespace=kube-system"
            - "--extra-create-metadata=true"
            - "--timeout=1200s"
          env:
            - name: ADDRESS
              value: /csi/csi.sock
          volumeMounts:
            - mountPath: /csi
              name: socket-dir
          resources:
            limits:
              memory: 400Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: csi-snapshotter
          image: feyico/csi-snapshotter:v6.2.2
          args:
            - "--v=2"
            - "--csi-address=$(ADDRESS)"
            - "--leader-election-namespace=kube-system"
            - "--leader-election"
            - "--timeout=1200s"
          env:
            - name: ADDRESS
              value: /csi/csi.sock
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: liveness-probe
          image: feyico/livenessprobe:v2.10.0
          args:
            - --csi-address=/csi/csi.sock
            - --probe-timeout=3s
            - --health-port=29652
            - --v=2
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
          resources:
            limits:
              memory: 100Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: nfs
          image: feyico/nfsplugin:canary
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          imagePullPolicy: IfNotPresent
          args:
            - "-v=5"
            - "--nodeid=$(NODE_ID)"
            - "--endpoint=$(CSI_ENDPOINT)"
          env:
            - name: NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CSI_ENDPOINT
              value: unix:///csi/csi.sock
          ports:
            - containerPort: 29652
              name: healthz
              protocol: TCP
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 30
            timeoutSeconds: 10
            periodSeconds: 30
          volumeMounts:
            - name: pods-mount-dir
              mountPath: /var/lib/kubelet/pods
              mountPropagation: "Bidirectional"
            - mountPath: /csi
              name: socket-dir
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 20Mi
      volumes:
        - name: pods-mount-dir
          hostPath:
            path: /var/lib/kubelet/pods
            type: Directory
        - name: socket-dir
          emptyDir: {}

csi-nfs-node.yaml

---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-nfs-node
  namespace: kube-system
spec:
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  selector:
    matchLabels:
      app: csi-nfs-node
  template:
    metadata:
      labels:
        app: csi-nfs-node
    spec:
      hostNetwork: true  # original nfs connection would be broken without hostNetwork setting
      dnsPolicy: ClusterFirstWithHostNet  # available values: Default, ClusterFirstWithHostNet, ClusterFirst
      serviceAccountName: csi-nfs-node-sa
      priorityClassName: system-node-critical
      securityContext:
        seccompProfile:
          type: RuntimeDefault
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
        - operator: "Exists"
      containers:
        - name: liveness-probe
          image: feyico/livenessprobe:v2.10.0
          args:
            - --csi-address=/csi/csi.sock
            - --probe-timeout=3s
            - --health-port=29653
            - --v=2
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
          resources:
            limits:
              memory: 100Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: node-driver-registrar
          image: feyico/csi-node-driver-registrar:v2.8.0
          args:
            - --v=2
            - --csi-address=/csi/csi.sock
            - --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
          livenessProbe:
            exec:
              command:
                - /csi-node-driver-registrar
                - --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
                - --mode=kubelet-registration-probe
            initialDelaySeconds: 30
            timeoutSeconds: 15
          env:
            - name: DRIVER_REG_SOCK_PATH
              value: /var/lib/kubelet/plugins/csi-nfsplugin/csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: registration-dir
              mountPath: /registration
          resources:
            limits:
              memory: 100Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: nfs
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          image: feyico/nfsplugin:canary
          args:
            - "-v=5"
            - "--nodeid=$(NODE_ID)"
            - "--endpoint=$(CSI_ENDPOINT)"
          env:
            - name: NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CSI_ENDPOINT
              value: unix:///csi/csi.sock
          ports:
            - containerPort: 29653
              name: healthz
              protocol: TCP
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 30
            timeoutSeconds: 10
            periodSeconds: 30
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: pods-mount-dir
              mountPath: /var/lib/kubelet/pods
              mountPropagation: "Bidirectional"
          resources:
            limits:
              memory: 300Mi
            requests:
              cpu: 10m
              memory: 20Mi
      volumes:
        - name: socket-dir
          hostPath:
            path: /var/lib/kubelet/plugins/csi-nfsplugin
            type: DirectoryOrCreate
        - name: pods-mount-dir
          hostPath:
            path: /var/lib/kubelet/pods
            type: Directory
        - hostPath:
            path: /var/lib/kubelet/plugins_registry
            type: Directory
          name: registration-dir

分别执行以下命令

kubectl apply -f rbac-csi-nfs.yaml kubectl apply -f csi-nfs-driverinfo.yaml kubectl apply -f csi-nfs-controller.yaml kubectl apply -f csi-nfs-node.yaml

部署完毕之后运行情况如下:

使用

PV供给

动态

创建storage class

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-csi
provisioner: nfs.csi.k8s.io
parameters:
  server: 127.0.0.1
  share: /data/nfsshare
  # csi.storage.k8s.io/provisioner-secret is only needed for providing mountOptions in DeleteVolume
  # csi.storage.k8s.io/provisioner-secret-name: "mount-options"
  # csi.storage.k8s.io/provisioner-secret-namespace: "default"
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1

创建PVC

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
    name: pvc1-delete-dyn
spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 3Gi
    storageClassName: nfs-csi
静态

创建PV

apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nfs
spec:
capacity:
  storage: 1Gi
accessModes:
  - ReadWriteMany
persistentVolumeReclaimPolicy: Delete
mountOptions:
  - hard
  - nfsvers=3
csi:
  driver: nfs.csi.k8s.io
  readOnly: false
  volumeHandle: unique-volumeid  # make sure it's a unique id in the cluster
  volumeAttributes:
    server: 127.0.0.1
    share: /data/nfsshare

Deployment使用

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-deployment-nfs
spec:
  accessModes:
    - ReadWriteMany  # In this example, multiple Pods consume the same PVC.
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs-csi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-nfs
spec:
  replicas: 1
  selector:
    matchLabels:
      name: deployment-nfs
  template:
    metadata:
      name: deployment-nfs
      labels:
        name: deployment-nfs
    spec:
      nodeSelector:
        "kubernetes.io/os": linux
      containers:
        - name: deployment-nfs
          image: nginx:1.19.5
          imagePullPolicy: IfNotPresent
          command:
            - "/bin/bash"
            - "-c"
            - set -euo pipefail; while true; do echo $(hostname) $(date) >> /mnt/nfs/outfile; sleep 1; done
          volumeMounts:
            - name: nfs
              mountPath: "/mnt/nfs"
      volumes:
        - name: nfs
          persistentVolumeClaim:
            claimName: pvc-deployment-nfs

NFS-CSI生命周期

  1. Register过程:

以下由DaemonSet类型Pod的csi-nfs-node中node-driver-registrar容器执行

插件注册容器node-driver-registrar挂载kubelet plugins_registry的hostpath类型目录(path: /var/lib/kubelet/plugins_registry/)将sock文件(nfs.csi.k8s.io-reg.sock)放入其中。

同时挂载kubelet plugins的hostpath类型目录(path: /var/lib/kubelet/plugins),创建csi-nfsplugin目录并将插件的sock文件放入(csi-nfsplugin/csi.sock)后续所有对该插件的RPC调用都会使用该sock。

并启动RPC服务。External component Driver Registrar 利用 kubelet plugin watcher 特性watch指定的文件夹路径来自动检测到这个存储插件。然后通过调用identity rpc服务,获得driver的信息,并完成注册。

  1. Provision过程:

以下由Deployment类型Pod的csi-nfs-controller中csi-provisioner容器执行

External Provisioner,Provisioner 将会 watch apiServer 中 PVC 资源的创建,并且PVC 所指定的 storageClass 的 provisioner是我们上面启动的插件(即nfs.csi.k8s.io)。那么,External Provisioner 将会调用 插件的 controller.createVolume() 服务。其主要工作应该是通过NFS客户端连接NFS服务器划分目录、赋予权限,同时返回其网络挂载路径(即NFS服务器内该PV目录路径)。

  1. Attach过程:(NFS没有该过程,以块存储为例)

部署External Attacher。Attacher 将会监听 apiServer 中 VolumeAttachment 对象的变化。一旦出现新的VolumeAttachment,Attacher 会调用插件的 controller.ControllerPublish() 服务。其主要工作是调用相关存储后端的api,把相应的磁盘 attach 到声明使用此 PVC/PV 的 pod 所调度到的 node 上。挂载的目录:/var/lib/kubelet/pods/<Pod ID>/volumes/<storage provisioner>/<name>

  1. Mount过程:

以下由DaemonSet类型Pod的csi-nfs-node中nfs容器执行

mount 不可能在远程的container里完成,所以这个工作需要kubelet来做。kubelet 检测到需要执行 Mount 操作的时候,通过调用 pkg/volume/csi 包,调用 CSI Node 服务内NodePublishVolume接口,完成 volume 的 Mount 阶段,将远端的目录mount到本地服务器。然后调用 CRI 启动带有 volume 参数的container,把上阶段准备好的volume 通过映射到 container指定的目录。

  1. Umount过程:

以下由DaemonSet类型Pod的csi-nfs-node中nfs容器执行

kubelet调用CSI Node服务内NodeUnpublishVolume接口,完成umount工作

  1. Provision过程:

以下由Deployment类型Pod的csi-nfs-controller中csi-provisioner容器执行

csi-provisioner根据storageclass中定义的reclaimPolicy执行回收,这边是delete所以执行了删除远端PV的操作,即删除对应的目录

源码解析

Todo

参考资料:

​https://mp.weixin.qq.com/s/jpopq16BOA_vrnLmejwEdQ​

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐