前置说明

在安装hbase之前, 安装了hadoop, 因为hbase的数据需要存放到hdfs中
spark也与hadoop有关联, 但是要理解spark仅仅用到hadoop的库, 并不依赖hadoop程序, 它不需要安装hadoop, spark仅依赖jdk.
spark有四大集群模式: standalone, mesos, yarn, k8s
根据数据量, 确定使用最简单的standalone模式.

下载

https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

#docker基础镜像

FROM ubuntu:16.04
COPY sources.list /etc/apt/
RUN apt update
RUN apt install -y vim tzdata
RUN rm /etc/localtime && ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone
ENV TZ="Asia/Shanghai"

WORKDIR /
COPY jdk1.8.0_171 /jdk1.8.0_171
ENV JAVA_HOME=/jdk1.8.0_171
ENV PATH=$PATH:/jdk1.8.0_171/bin
RUN ln -s /jdk1.8.0_171/bin/java /usr/bin/java

#安装spark

WORKDIR /spark
COPY spark-2.4.3-bin-hadoop2.7 .
ENV SPARK_HOME=/spark
ENV PATH=$PATH:/spark/bin

#配置spark相关端口

mkdir -p /home/mo/sjfx-spark-data
cp spark-2.4.3-bin-hadoop2.7/conf -r /home/mo/sjfx-spark-data/config
mv /home/mo/sjfx-spark-data/config/spark-env.sh.template /home/mo/sjfx-spark-data/config/spark-env.sh

修改spark-env.sh, 增加

export SPARK_MASTER_PORT=5030
export SPARK_MASTER_WEBUI_PORT=5040
export SPARK_WORKER_PORT=5031
export SPARK_WORKER_WEBUI_PORT=5041

#启动master

#/bin/sh
docker stop sjfxspark-master
docker rm sjfxspark-master
docker run -d --name sjfxspark-master --net=host \
  -v /home/mo/sjfx-spark-data/config:/spark/conf  \
  -v /home/mo/sjfx-spark-data/logs:/spark/logs  \
  -v /home/mo/sjfx-spark-data/work:/spark/work  \
  sjfxspark:v1 sh -c "/spark/sbin/start-master.sh && tail -f /dev/null"

可以查看web ui有没有显示了: http://192.168.1.26:5040

#启动slave

#/bin/sh
docker stop sjfxspark-slave
docker rm sjfxspark-slave
docker run -d --name sjfxspark-slave --net=host \
  -v /home/mo/sjfx-spark-data/config:/spark/conf  \
  -v /home/mo/sjfx-spark-data/logs:/spark/logs  \
  -v /home/mo/sjfx-spark-data/work:/spark/work  \
  sjfxspark:v1 sh -c "/spark/sbin/start-slave.sh spark://192.168.1.26:5030 && tail -f /dev/null"

查看web ui : http://192.168.1.26:5041/
再次查看master web ui , 发现已经有work信息了
image.png

测试

./spark-2.4.3-bin-hadoop2.7/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.1.26:5030 ./spark-2.4.3-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.3.jar 100

可以在终端上看到输出:
2019-06-06 11:34:56 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.886408 s
Pi is roughly 3.1414487141448713

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐