1、安装k8s集群

2、基于standalone模式搭建 spark集群

helm repo add my-repo https://charts.bitnami.com/bitnami
helm install my-release my-repo/spark

3、查看Pod运行状态

[root@node-04 ~]# kubectl get pod -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
my-release-spark-master-0   1/1     Running   0          19m   10.244.1.174   node-05   <none>           <none>
my-release-spark-worker-0   1/1     Running   0          19m   10.244.1.175   node-05   <none>           <none>
my-release-spark-worker-1   1/1     Running   0          18m   10.244.0.107   node-04   <none>           <none>

4、查看Service

[root@node-04 spark-3.3.1-bin-hadoop3]# kubectl get svc
NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
my-release-spark-headless     ClusterIP   None             <none>        <none>            70s
my-release-spark-master-svc   ClusterIP   10.104.26.1      <none>        7077/TCP,80/TCP   70s

5、连接pyspark

[root@node-04 ~]# kubectl exec my-release-spark-master-0 -it -- pyspark --conf spark.driver.bindAddress=10.244.1.174 --conf spark.driver.host=10.244.1.174
Python 3.8.15 (default, Oct 25 2022, 21:42:05) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/12 12:22:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Python version 3.8.15 (default, Oct 25 2022 21:42:05)
Spark context Web UI available at http://10.244.1.174:4040
Spark context available as 'sc' (master = local[*], app id = local-1668255759100).
SparkSession available as 'spark'.
>>> 

运行pyspark脚本

words = 'the quick brown fox jumps over the\
        lazy dog the quick brown fox jumps over the lazy dog'
sc = SparkContext.getOrCreate()
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()

6、查看 Spark web UI on port 8080

[root@node-04 spark-3.3.1-bin-hadoop3]# kubectl port-forward --address 0.0.0.0 service/my-release-spark-master-svc 8080:80
Forwarding from 0.0.0.0:8080 -> 8080
Handling connection for 8080
Handling connection for 8080

在这里插入图片描述

7、spark-submit提交

下载spark, 配置spark的环境变量,配置完保证spark-shell能够进入spark环境,本地环境配置的环境变量如下:

export SPARK_HOME=/data/packages/spark-3.3.1-bin-hadoop3
export PATH=$SPARK_HOME/bin:$PATH
./bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.kubernetes.container.image=bitnami/spark:3.3.1-debian-11-r5 \
    --master k8s://https://10.127.91.1:6443 \
    --conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-release-spark-master-svc:7077 \
    --deploy-mode cluster \
    local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.3.1.jar 1000

参数解释

  • bitnami/spark:3.3.1-debian-11-r5是构建的基础镜像
  • k8s://https://10.127.91.1:6443是k8s-apiserver-port,可以通过kubectl cluster-info获取
  • local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.3.1.jar 中的jar应该在bitnami/spark:3.3.1-debian-11-r5镜像中 ,保证每个 运行worker中都存在
  • spark://my-release-spark-master-svc:7077可以通过kubectl get svc -n 命名空间获取

参考链接
https://github.com/bitnami/charts/tree/main/bitnami/spark
https://testdriven.io/blog/deploying-spark-on-kubernetes/
https://www.oak-tree.tech/blog/spark-kubernetes-primer

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐