Kubernetes Operator for Apache Spark Design
Introduction

In Spark 2.3, Kubernetes becomes an official scheduler backend for Spark, additionally to the standalone scheduler, Mesos, and Yarn. Compared with the alternative approach of deploying a standalone Spark cluster on top of Kubernetes and submit applications to run on the standalone cluster, having Kubernetes as a native scheduler backend offers some important benefits as discussed in SPARK-18278 and is a huge leap(飞跃) forward. However, the way life cycle of Spark applications are managed, e.g., how applications get submitted to run on Kubernetes and how application status is tracked, are vastly different (巨大不同)from that of other types of workloads on Kubernetes, e.g., Deployments, DaemonSets, and StatefulSets. The Kubernetes Operator for Apache Spark reduces the gap and allow Spark applications to be specified, run, and monitored idiomatically(惯用的) on Kubernetes.

Specifically, the Kubernetes Operator for Apache Spark follows the recent trend of leveraging the operator pattern for managing the life cycle of Spark applications on a Kubernetes cluster. The operator allows Spark applications to be specified in a declarative manner (e.g., in a YAML file) and run without the need to deal with the spark submission process. It also enables status of Spark applications to be tracked and presented idiomatically like other types of workloads on Kubernetes.

Architecture

The operator consists of:

  • a SparkApplication controller that watches events of creation, updates, and deletion of SparkApplication objects and acts on the watch events,
  • a submission runner that runs spark-submit for submissions received from the controller,
  • a Spark pod monitor that watches for Spark pods and sends pod status updates to the controller,
  • a Mutating(变化的) Admission Webhook that handles customizations for Spark driver and executor pods based on the annotations on the pods added by the controller,
  • and also a command-line tool named sparkctl for working with the operator.

在这里插入图片描述
Specifically, a user uses the sparkctl (or kubectl) to create a SparkApplication object. The SparkApplication controller receives the object through a watcher from the API server, creates a submission carrying the spark-submit arguments, and sends the submission to the submission runner. The submission runner submits the application to run and creates the driver pod of the application. Upon starting, the driver pod creates the executor pods. While the application is running, the Spark pod monitor watches the pods of the application and sends status updates of the pods back to the controller, which then updates the status of the application accordingly.

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐