基于k8s搭建spark
【代码】基于k8s搭建spark。
·
1、安装k8s集群
2、基于standalone模式搭建 spark集群
helm repo add my-repo https://charts.bitnami.com/bitnami
helm install my-release my-repo/spark
3、查看Pod运行状态
[root@node-04 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
my-release-spark-master-0 1/1 Running 0 19m 10.244.1.174 node-05 <none> <none>
my-release-spark-worker-0 1/1 Running 0 19m 10.244.1.175 node-05 <none> <none>
my-release-spark-worker-1 1/1 Running 0 18m 10.244.0.107 node-04 <none> <none>
4、查看Service
[root@node-04 spark-3.3.1-bin-hadoop3]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-release-spark-headless ClusterIP None <none> <none> 70s
my-release-spark-master-svc ClusterIP 10.104.26.1 <none> 7077/TCP,80/TCP 70s
5、连接pyspark
[root@node-04 ~]# kubectl exec my-release-spark-master-0 -it -- pyspark --conf spark.driver.bindAddress=10.244.1.174 --conf spark.driver.host=10.244.1.174
Python 3.8.15 (default, Oct 25 2022, 21:42:05)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/11/12 12:22:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Python version 3.8.15 (default, Oct 25 2022 21:42:05)
Spark context Web UI available at http://10.244.1.174:4040
Spark context available as 'sc' (master = local[*], app id = local-1668255759100).
SparkSession available as 'spark'.
>>>
运行pyspark
脚本
words = 'the quick brown fox jumps over the\
lazy dog the quick brown fox jumps over the lazy dog'
sc = SparkContext.getOrCreate()
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()
6、查看 Spark web UI on port 8080
[root@node-04 spark-3.3.1-bin-hadoop3]# kubectl port-forward --address 0.0.0.0 service/my-release-spark-master-svc 8080:80
Forwarding from 0.0.0.0:8080 -> 8080
Handling connection for 8080
Handling connection for 8080
7、spark-submit提交
下载spark, 配置spark的环境变量,配置完保证spark-shell
能够进入spark环境,本地环境配置的环境变量如下:
export SPARK_HOME=/data/packages/spark-3.3.1-bin-hadoop3
export PATH=$SPARK_HOME/bin:$PATH
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=bitnami/spark:3.3.1-debian-11-r5 \
--master k8s://https://10.127.91.1:6443 \
--conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-release-spark-master-svc:7077 \
--deploy-mode cluster \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.3.1.jar 1000
参数解释
bitnami/spark:3.3.1-debian-11-r5
是构建的基础镜像k8s://https://10.127.91.1:6443
是k8s-apiserver-port,可以通过kubectl cluster-info
获取local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.3.1.jar
中的jar
应该在bitnami/spark:3.3.1-debian-11-r5
镜像中 ,保证每个 运行worker
中都存在spark://my-release-spark-master-svc:7077
可以通过kubectl get svc -n 命名空间
获取
参考链接
https://github.com/bitnami/charts/tree/main/bitnami/spark
https://testdriven.io/blog/deploying-spark-on-kubernetes/
https://www.oak-tree.tech/blog/spark-kubernetes-primer
更多推荐
已为社区贡献12条内容
所有评论(0)