Spark集群搭建+基于zookeeper的高可用HA
1. Spark高可用HA1.1安装zookeeper1.1.1下载zookeeper-3.4.61.1.2 解压zookeeper1.1.3 修改ZOOKEEPER_HOME/PATH1.1.4 zookeeper-3.4.6]$ mkdirlogs /data①cpconf/zoo_sample.cfg conf/zoo.cfg②修改zoo.cfg中的
1. Spark高可用HA
1.1安装zookeeper
1.1.1下载zookeeper-3.4.6
1.1.2 解压zookeeper
1.1.3 修改ZOOKEEPER_HOME/PATH
1.1.4 zookeeper-3.4.6]$ mkdirlogs /data
①cpconf/zoo_sample.cfg conf/zoo.cfg
②修改zoo.cfg中的
dataDir=**/zookeeper-3.4.6/data dataLogDir=**/zookeeper-3.4.6/logs #这是三台zookeeper集群的名称端口号 server.0= master:2888:3888 server.1= slave1:2888:3888 server.2= slave2:2888:3888 |
③进入data,创建myid,设置myid里的值0(对应conf中的serve.0)
echo 0>myid
1.1.5 scp到其他master(工业一般是3台master)
其他机器切记修改myid
1.1.6 启动zookeeper
三台机器分别启动(不分前后)
bin]$zkServer.sh start (该shell命令需要传入参数)
启动成功后,jps显示QuorumpeerMain
1.2 设置spark使用zookeeper管理
1.2.1vi spark-env.sh(三台都要)
export JAVA_HOME=/usr/java/jdk1.8.0_20/ export SCALA_HOME=/home/iespark/hadoop_program_files/scala-2.10.6/ export HADOOP_HOME=/home/iespark/hadoop_program_files/hadoop-2.6.0 export HADOOP_CONF_DIR=/home/iespark/hadoop_program_files/hadoop-2.6.0/etc/hadoop #export SPARK_MASTER_IP=hadoop5(设置了zookeeper这行不用了,注释掉) export SPARK_DAEMON_JAVA_OPTS=”-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 –Dspark.deploy.zookeeper.dir=/spark” export SPARK_WORKER_MEMORY=2g export SPARK_EXECUTOR_MEMORY=2g export SPARK_DRIVER_MEMORY=2g export SPARK_WORKER_CORES=1 #export SPARK_PID_DIR=/home/iespark/hadoop_program_files/sparkdata
|
1.2.2 启动spark
在master上sbin]$./start-all.sh
发现只有master上有master进程,而slave1,2都没有master进程,需要到其他机器上挨个启动sbin]$./start-master.sh(jps,slave1,2上也启动了master)
可以到slave1,2:8080中查看,没有work,status模式是standby
1.2.3测试
bin]$./spark-shell –master spark://master:7077,slave1:7077,slave2:7077
(此刻运行正常)
在master->spark中
sbin]$./stop.master.sh |
(shell中的连接失败,等待被选中的activety的master连接shell(zk实现的),这个过程可能需要分钟级别,此时的程序可仅需进行(粗粒度),)
显示以下信息说明新的master成功启动
master has changed,new master is at spark://slave1:7077 |
到Web控制台验证,master over but slave1 connect
注:集群重启,还会默认用slave1作为master
2. Spark安装
2.1 解压,配置~/.Bash_profile
2.2 配置spark-env.sh
export JAVA_HOME=/home/zkpk/jdk/jdk1.8.0_60 export SCALA_HOME=/home/zkpk/scala-2.10.4 export HADOOP_HOME=/home/zkpk/hadoop-2.6.0 export HADOOP_CONF_DIR=/home/zkpk/hadoop-2.6.0/etc/hadoop export SPARK_MASTER_IP=master export SPARK_WORKER_MEMORY=4g export SPARK_EXECUTOR_MEMORY=4g export SPARK_DRIVER_MEMORY=4G export SPARK_WORKER_CORES=8 |
2.3配置SPARK_HOME/slaves
slave1 slave2 |
2.4 配置spark-defaults.conf
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" spark.eventLog.enabled true spark.eventLog.dir hdfs://master:9000/historyserverforSpark spark.yarn.historyServer.address master:18080 spark.history.fs.logDirectory hdfs://master:9000/historyserverforSpark |
2.5测试启动
2.5.1 spark shell
SPARK_HOME/bin]$ ./spark-shell –master spark://master:7077 |
2.5.2 spark submit
SPARK_HOME/bin]$./spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 100 |
3.Scala单词排序
Scala>sc.textFile(“hdfs://master:9000/home/data”).flatMap(_.split(“ ”)).map(word=>(word,1)).reduceByKey(_+_).map(pair=>(pair._2,pair._1)).sortByKey(false).map(pair=>(pair._2,pair._1)).saveAsTextFile(“hdfs://master:9000/home/out”) |
更多推荐
所有评论(0)