k8s部署spark,多台宿主机pod里面driver端口不通问题
现在环境是 k8s 部署了jupyterlab,jupyterlab里面启动pyspark或者spark-shell命令,会由于driver端口回调问题而在yarn任务日志里面报错。一种方式是完全定死ip和端口,一种方式是在代码或者yaml里面动态的获取部分代码如下<dependency><groupId>io.kubernetes</groupId><a
·
现在环境是 k8s 部署了jupyterlab,jupyterlab里面启动pyspark或者spark-shell命令,会由于driver端口回调问题而在yarn任务日志里面报错。
一种方式是完全定死ip和端口,
一种方式是在代码或者yaml里面动态的获取
部分代码如下
<dependency>
<groupId>io.kubernetes</groupId>
<artifactId>client-java</artifactId>
<version>4.0.0-beta1</version>
</dependency>
//更新nodeporkuai
AtomicReference<Boolean> needUpdate = new AtomicReference<>(false);
service.getSpec().getPorts().forEach(p -> {
if (!p.getPort().equals(p.getNodePort())) {
p.setTargetPort(new IntOrString(p.getNodePort()));
}
if (DRIVER_PORT_NAME.equalsIgnoreCase(p.getName())) {
develop.getSparkEnv().add(new V1EnvVarBuilder()
.withName("DRIVERPORT")
.withValue(String.valueOf(p.getNodePort()))
.build());
} else if (BM_PORT_NAME.equalsIgnoreCase(p.getName())) {
develop.getSparkEnv().add(new V1EnvVarBuilder()
.withName("BMPORT")
.withValue(String.valueOf(p.getNodePort()))
.build());
}
});
V1ObjectFieldSelector v1ObjectFieldSelector =new V1ObjectFieldSelector();
v1ObjectFieldSelector.setApiVersion("v1");
v1ObjectFieldSelector.setFieldPath("status.hostIP");
V1EnvVarSource v1EnvVarSource=new V1EnvVarSource();
v1EnvVarSource.setFieldRef(v1ObjectFieldSelector);
develop.getSparkEnv().add(new V1EnvVarBuilder()
.withName("MY_HOST_IP").withValueFrom(v1EnvVarSource).build());
最后yaml文件如下
env:
- name: USERNAME
value: x1
- name: NODESELECTOR
- name: DRIVERPORT
value: "31218"
- name: BMPORT
value: "32095"
- name: MY_HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
提交的命令格式如下
/export/spark-bin-hadoop2.7/bin/pyspark-cust.sh --conf spark.driver.bindAddress=0.0.0.0 --conf spark.driver.host=${MY_HOST_IP} --conf spark.driver.port=${DRIVERPORT} --conf spark.driver.blockManager.port=${BMPORT} --master yarn
~
更多推荐
已为社区贡献2条内容
所有评论(0)