Spark on K8S 的几种模式
Spark on K8S 的几种模式Standalone:在 K8S 启动一个长期运行的集群,所有 Job 都通过 spark-submit 向这个集群提交Kubernetes Native:通过 spark-submit 直接向 K8S 的 API Server 提交,申请到资源后启动 Pod 做为 Driver 和 Executor 执行 Job,参考 http://spark.apache.
Spark on K8S 的几种模式
- Standalone:在 K8S 启动一个长期运行的集群,所有 Job 都通过 spark-submit 向这个集群提交
- Kubernetes Native:通过 spark-submit 直接向 K8S 的 API Server 提交,申请到资源后启动 Pod 做为 Driver 和 Executor 执行 Job,参考 http://spark.apache.org/docs/2.4.6/running-on-kubernetes.html
- Spark Operator:安装 Spark Operator,然后定义 spark-app.yaml,再执行 kubectl apply -f spark-app.yaml,这种申明式 API 和调用方式是 K8S 的典型应用方式,参考 https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
部署案例----Kubernetes Native
下载spark安装包
下载地址:https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
下载的是带有hadoop依赖包的软件。
制作spark镜像
上传到服务器,并在服务器上解压下载的软件包
tar zxvf spark-2.4.5-bin-hadoop2.7.tgz
#软连接
ln -s <sparkhome,刚刚解压的文件目录路径> /opt/spark
制作镜像
cd /opt/spark
./bin/docker-image-tool.sh -r registry.cn-beijing.aliyuncs.com -t spark-v2.4.5 build
查看本地生成的镜像
[root@iZ2ze48olpbvnopfiqqk33Z spark]# docker images
提交任务
查看K8S集群信息
[root@iZ2ze48olpbvnopfiqqk33Z spark]# kubectl cluster-info
提交任务
./bin/spark-submit
–master k8s://https://1x.3x.6.2xx:6443
–deploy-mode cluster
–name spark-pi
–class org.apache.spark.examples.JavaSparkPi
–conf spark.executor.instances=5
–conf spark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/spark:spark-v2.4.5
local:opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
查看状态
kubectl get all
查看详细报错日志
[root@iZ2ze48olpbvnopfiqqk33Z ~]# kubectl logs pod/spark-pi-1635131060330-driver
21/10/25 03:04:24 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at org.apache.spark.examples.JavaSparkPi.main(JavaSparkPi.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-1635131060330-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1635131060330-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:447)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:337)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:318)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:833)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:226)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:170)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 19 more
21/10/25 03:09:24 INFO ShutdownHookManager: Shutdown hook called
21/10/25 03:09:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-80b52b4e-a2ca-467d-a60b-a14b8d8ccdba
21/10/25 03:09:24 INFO ShutdownHookManager: Deleting directory /var/data/spark-a8985894-a6f4-488c-9341-a27cffd859ee/spark-9b9abc84-8744-4228-a491-a5f13c747ab7
报错解决:
1. kubectl create serviceaccount spark
2. kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
3. spark-submit --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
删除报错的任务,重新执行
./bin/spark-submit
–master k8s://https://1x.3x.6.xxx:6443
–deploy-mode cluster
–name spark-pi
–class org.apache.spark.examples.JavaSparkPi
–conf spark.executor.instances=5
–conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
–conf spark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/spark:spark-v2.4.5
local:opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
Exec报错
报错–运行状态ImagePullBackOff
查看详细信息
[root@iZ2ze48olpbvnopfiqqk33Z ~]# kubectl describe pod/spark-pi-1635139347481-driver
Name: spark-pi-1635139347481-driver
Namespace: default
Priority: 0
Node: cn-beijing.1x.3x.6.2/1x.3x.6.2
Start Time: Mon, 25 Oct 2021 13:22:28 +0800
Labels: spark-app-selector=spark-622bf6defc67445da34ebb43947473ac
spark-role=driver
Annotations: kubernetes.io/psp: ack.privileged
Status: Running
IP: 172.22.113.222
IPs:
IP: 172.22.113.222
Containers:
spark-kubernetes-driver:
Container ID: docker://8a967fd9711cf2f784cdafda4db4109afd56daf50b35ba482cd7e5d0bbc06d1e
Image: registry.cn-beijing.aliyuncs.com/spark:spark-v2.4.5
Image ID: docker://sha256:75733d0f823832c555bdf4c6412587fe340838db9eeea925b74178f3216b3f2c
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
org.apache.spark.examples.JavaSparkPi
spark-internal
State: Running
Started: Mon, 25 Oct 2021 13:22:29 +0800
Ready: True
Restart Count: 0
Limits:
memory: 1408Mi
Requests:
cpu: 1
memory: 1408Mi
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-0ba71901-ccc9-4f59-ac8f-b5fbc06509d2
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/opt/spark/conf from spark-conf-volume (rw)
/var/data/spark-0ba71901-ccc9-4f59-ac8f-b5fbc06509d2 from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-token-rw6sc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
spark-conf-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-pi-1635139347481-driver-conf-map
Optional: false
spark-token-rw6sc:
Type: Secret (a volume populated by a Secret)
SecretName: spark-token-rw6sc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
报错无法拉取镜像,分析原因镜像并没有上传到镜像仓库,执行Executor的node机器并不能获取到镜像,因此报错
解决:
将spark制作生成的镜像手工传到各个node机器上,执行spark-submit提交Job时,指定配置参数只从本地获取镜像
- 将生成的镜像导出
docker save -o spark.tar registry.cn-beijing.aliyuncs.com/regis-k/vicky:spark-push-v2.4.5
registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
- 将镜像分发到各个节点
scp spark.tar shadow@1x.3x.6.1:/home/shadow
…
- 在各个节点将镜像导入到节点镜像库中
[root@iZ2ze48olpbvnopfiqqk2zZ shadow]# docker load < spark.tar
e8b689711f21: Loading layer [==================================================>] 83.86MB/83.86MB
2bf2b8c78141: Loading layer [==================================================>] 5.177MB/5.177MB
f3fd6088fa34: Loading layer [==================================================>] 3.584kB/3.584kB
8138b5fec066: Loading layer [==================================================>] 210.9MB/210.9MB
4f731e722019: Loading layer [==================================================>] 25.31MB/25.31MB
3a022d792160: Loading layer [==================================================>] 241MB/241MB
4ae4c587a876: Loading layer [==================================================>] 73.73kB/73.73kB
04094b33ae8b: Loading layer [==================================================>] 58.88kB/58.88kB
33fe39ef5ffe: Loading layer [==================================================>] 6.144kB/6.144kB
2805f97e9297: Loading layer [==================================================>] 3.942MB/3.942MB
1d877b4ff939: Loading layer [==================================================>] 9.728kB/9.728kB
12ff6a9017ed: Loading layer [==================================================>] 1.016MB/1.016MB
Loaded image: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
- 重新提交job
./bin/spark-submit \
--master k8s://https://1x.3x.6.xxx:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.JavaSparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Never \
--conf spark.kubernetes.container.image=registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5 \
local:opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
- 查看job,已经可以正常创建并正常执行程序
- 执行过程日志
[root@iZ2ze48olpbvnopfiqqk33Z spark]# ./bin/spark-submit --master k8s://https://1x.3x.6.xxx:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.JavaSparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image.pullPolicy=Never --conf spark.kubernetes.container.image=registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5 local:opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/27 14:06:37 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: spark-pi-1635314796749-driver
namespace: default
labels: spark-app-selector -> spark-946d374c98b6426cb38bdc7d44750175, spark-role -> driver
pod uid: bf662da1-335a-4680-b08f-fd52392f0649
creation time: 2021-10-27T06:06:37Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-rw6sc
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
21/10/27 14:06:37 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: spark-pi-1635314796749-driver
namespace: default
labels: spark-app-selector -> spark-946d374c98b6426cb38bdc7d44750175, spark-role -> driver
pod uid: bf662da1-335a-4680-b08f-fd52392f0649
creation time: 2021-10-27T06:06:37Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-rw6sc
node name: cn-beijing.1x.3x.6.2
start time: N/A
container images: N/A
phase: Pending
status: []
21/10/27 14:06:37 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: spark-pi-1635314796749-driver
namespace: default
labels: spark-app-selector -> spark-946d374c98b6426cb38bdc7d44750175, spark-role -> driver
pod uid: bf662da1-335a-4680-b08f-fd52392f0649
creation time: 2021-10-27T06:06:37Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-rw6sc
node name: cn-beijing.1x.3x.6.2
start time: 2021-10-27T06:06:37Z
container images: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
phase: Pending
status: [ContainerStatus(containerID=null, image=registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={started=false})]
21/10/27 14:06:37 INFO Client: Waiting for application spark-pi to finish...
21/10/27 14:06:38 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: spark-pi-1635314796749-driver
namespace: default
labels: spark-app-selector -> spark-946d374c98b6426cb38bdc7d44750175, spark-role -> driver
pod uid: bf662da1-335a-4680-b08f-fd52392f0649
creation time: 2021-10-27T06:06:37Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-rw6sc
node name: cn-beijing.1x.3x.6.2
start time: 2021-10-27T06:06:37Z
container images: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
phase: Running
status: [ContainerStatus(containerID=docker://db5681643d2a34b56bb5706a965bc732877b77e70162d479deff269340d19f3f, image=registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5, imageID=docker-pullable://registry.cn-beijing.aliyuncs.com/regis-k/vicky@sha256:4105a09b45d9648e1a757538c0df2d482e8d58fae752d961a88486ecbbf9f24e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2021-10-27T06:06:38Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={started=true})]
21/10/27 14:06:44 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: spark-pi-1635314796749-driver
namespace: default
labels: spark-app-selector -> spark-946d374c98b6426cb38bdc7d44750175, spark-role -> driver
pod uid: bf662da1-335a-4680-b08f-fd52392f0649
creation time: 2021-10-27T06:06:37Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-rw6sc
node name: cn-beijing.1x.3x.6.2
start time: 2021-10-27T06:06:37Z
container images: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
phase: Succeeded
status: [ContainerStatus(containerID=docker://db5681643d2a34b56bb5706a965bc732877b77e70162d479deff269340d19f3f, image=registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5, imageID=docker-pullable://registry.cn-beijing.aliyuncs.com/regis-k/vicky@sha256:4105a09b45d9648e1a757538c0df2d482e8d58fae752d961a88486ecbbf9f24e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://db5681643d2a34b56bb5706a965bc732877b77e70162d479deff269340d19f3f, exitCode=0, finishedAt=2021-10-27T06:06:44Z, message=null, reason=Completed, signal=null, startedAt=2021-10-27T06:06:38Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={started=false})]
21/10/27 14:06:44 INFO LoggingPodStatusWatcherImpl: Container final statuses:
Container name: spark-kubernetes-driver
Container image: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
Container state: Terminated
Exit code: 0
21/10/27 14:06:44 INFO Client: Application spark-pi finished.
21/10/27 14:06:44 INFO ShutdownHookManager: Shutdown hook called
21/10/27 14:06:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-74992018-cfa7-48b2-934a-2c26d18195e6
更多推荐
所有评论(0)