利用Docker Compose 搭建Spark 集群
简介Compose 作为Docker官方编排工具,可以让用户通过编写一个简单的模板文件,快速的搭建和管理基于Docker容器的应用集群。其定位是“定义和运行多个Docker容器的应用”,它允许用户通过一个YAML格式的模板文件来定义一组相关联的应用容器为一个项目。官方文档:https://hub.docker.com/r/sequenceiq/spark/安装我的安装环境是:cento
简介
Compose 作为Docker官方编排工具,可以让用户通过编写一个简单的模板文件,快速的搭建和管理基于Docker容器的应用集群。其定位是“定义和运行多个Docker容器的应用”,它允许用户通过一个YAML格式的模板文件来定义一组相关联的应用容器为一个项目。
安装
我的安装环境是:centos 7.3 Docker version 1.12.6
pip 安装
sudo pip install -U docker-compose
验证
# docker-compose version
docker-compose version 1.17.1, build 6d101fb
docker-py version: 2.6.1
CPython version: 2.7.5
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
添加bash补全命令:
curl -L https://raw.githubusercontent.com/docker/compose/1.17.1/contrib/completion /bash/docker-
compose > /etc/bash_completion.d/docker-cpmpose
输出以下内容:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15 100 15 0 0 12 0 0:00:01 0:00:01 --:--:-- 12
curl: (3) <url> malformed
下载镜像
sequenceiq/docker-spark 镜像已经安装了对spark完整依赖,先下载到本地
docker pull sequenceiq/spark:1.6.0
创建docker-compose.yml 文件,内容如下
# http://github.com/yeasy/docker-compose-files
# This compose file will start spark master node and the worker node.
# All nodes will become a cluster automatically.
# You can run: docker-compose scale worker=2
# After startup, try submit a pi calculation application.
# /urs/local/spark/bin/spark-submit --master spark://master:7077 --class org.apache.spark.examples.SparkPi /usr/local/spark/lib/spark-examples-1.4.0-hadoop2.6.0.jar 1000
master:
image: sequenceiq/spark:1.6.0
hostname: master
ports:
- "4040:4040"
- "8042:8042"
- "7077:7077"
- "8088:8088"
- "8080:8080"
restart: always
#mem_limit: 1024m
command: bash /usr/local/spark/sbin/start-master.sh && ping localhost > /dev/null
worker:
image: sequenceiq/spark:1.6.0
links:
- master:master
expose:
- "8081"
restart: always
command: bash /usr/local/spark/sbin/start-slave.sh spark://master:7077 && ping localhost >/dev/null
文件解析:
master 服务
首先,master 服务映射了好几组端口到本地,分别功能为: 4040:Spark 运行任务时候提供 web 界面观测任务的具体执行状况,包括执行到哪个阶段、在哪个 executor 上执行; 8042:Hadoop 的节点管理界面; 7077:Spark 主节点的监听端口,用户可以提交应用到这个端口,worker 节点也可以通过这个端口连接到主节点构成集群; 8080:Spark 的监控界面,可以看到所有的 worker、应用整体信息; * 8088:Hadoop 集群的整体监控界面。
master 服务启动后,执行了 bash /usr/local/spark/sbin/start-master.sh 命令来配置自己为 master 节点,然后通过 ping 来避免容器退出。
worker 服务
类似 master 节点,启动后,执行了 /usr/local/spark/sbin/start-slave.sh spark://master:7077 命令来配置自己为 worker 节点,然后通过 ping 来避免容器退出。
注意,启动脚本后面需要提供 spark://master:7077 参数来指定 master 节点地址。
8081 端口提供的 web 界面,可以看到该 worker 节点上任务的具体执行情况。
关于compose模板文件的详细:http://www.jianshu.com/p/2217cfed29d7
启动集群
创建文件之后,然后在文件的当前目录下执行:
docker-compose up
然后会有下面的输出:
Creating sparkcompose_master_1...
Creating sparkcompose_slave_1...
Attaching to sparkcompose_master_1, sparkcompose_slave_1
master_1 | /
master_1 | Starting sshd: [ OK ]
slave_1 | /
slave_1 | Starting sshd: [ OK ]
master_1 | Starting namenodes on [master]
slave_1 | Starting namenodes on [5d0ea02da185]
master_1 | master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out
slave_1 | 5d0ea02da185: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-5d0ea02da185.out
master_1 | localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-master.out
slave_1 | localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-5d0ea02da185.out
master_1 | Starting secondary namenodes [0.0.0.0]
slave_1 | Starting secondary namenodes [0.0.0.0]
master_1 | 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-master.out
master_1 | starting yarn daemons
master_1 | starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-master.out
master_1 | localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-master.out
master_1 | starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark--org.apache.spark.deploy.master.Master-1-master.out
slave_1 | 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-5d0ea02da185.out
slave_1 | starting yarn daemons
slave_1 | starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-5d0ea02da185.out
slave_1 | localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-5d0ea02da185.out
slave_1 | starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark--org.apache.spark.deploy.worker.Worker-1-5d0ea02da185.out
之后打开另一个终端,docker-compose 服务起来后,我们还可以用 scale 命令来动态扩展 Spark 的 worker 节点数,例如
$ docker-compose scale worker=2
Creating and starting 2... done
测试
进入master节点中:
docker exec -it root_master_1 /bin/bash
进入spark shell:
bash-4.1# spark-shell
显示以下内容:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
输入scala命令测试Spark能否工作:
scala> sc.parallelize(1 to 1000).count()
res0: Long = 1000
scala>
执行应用
Spark 推荐用 spark-submit 命令来提交执行的命令,基本语法为
spark-submit \
–class your-class-name \
–master master_url \
your-jar-file
app_params
例如,我们可以使用 spark 自带样例中的计算 Pi 的应用。
在 master 节点上执行命令:
/usr/local/spark/bin/spark-submit --master spark://master:7077 --class org.apache.spark.examples.SparkPi /usr/local/spark/lib/spark-examples-1.4.0-hadoop2.6.0.jar 1000
这样一个使用docker镜像快速搭建spark集群就成功运行了,但是要是真的要好好学习spark的话,还是要自己一步步从每个组件搭建,环境搭建一定要知其所以然。
这是一个完整的搭建过程:https://www.cnblogs.com/jasonfreak/p/5391190.html
更多推荐
所有评论(0)