hadoop 2.8.3 集群环境搭建
hadoop 2.8.3集群环境搭建Linux : CentOS 7hadoop版本: hadoop-2.8.3JDK: 1.8.0_161这里打算直接root用户,搭建3个节点的hadoop集群环境。先关闭防火墙,安装成功后,启动哪里报错再针对性开放端口。/etc/hosts192.168.247.129 master192.168.247.131 sla...
hadoop 2.8.3集群环境搭建
Linux : CentOS 7
hadoop版本: hadoop-2.8.3
JDK: 1.8.0_161
这里打算直接root用户,搭建3个节点的hadoop集群环境。先关闭防火墙,安装成功后,启动哪里报错再针对性开放端口。
/etc/hosts
192.168.247.129 master
192.168.247.131 slave1
192.168.247.132 slave2
配置JDK环境
hadoop-2.8.3 需要JDK1.8的环境,这里使用的是 JDK 1.8.0_161,三个系统都使用同一个版本的JDK.。配置过程略。
配置ssh免密码登录
详细可以参考上一篇文章 CentOS 7 配置ssh免密码登录
分别在三台机器执行执行以下命令
生成公钥、密码
cd /root/.ssh/
ssh-keygen -t rsa
复制公钥到其它机器
实际上把生成公钥id_rsa.pub 追加到其它机器上的 authorized_keys 末尾。
ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2
验证
看看首次输入密码,exit后,看看再次登录到其它机器用不用密码。
确认三台机器之间两两登录都不需要密码
ssh master
ssh slave1
ssh slave2
如果ssh公钥不生效,尝试保证下面两个条件:1、ssh目录的权限必须是700 2、 .ssh/authorized_keys文件权限必须是600
配置hadoop
这里, 我打算把hadoop程序包、及数据文件夹放到 /opt 下,在 /opt目录下创建hadoop文件夹。把安装上传(下载)到这里来,解压 hadoop-2.8.3.tar.gz。在 /opt/hadoop目录下,创建数据目录 hdfs,然后在hdfs目录下,分别创建 data、name、tmp三个目录,用于存放hadoop 文件系统的数据。
cd /opt
mkdir hadoop
cd hadoop
tar -xzvf hadoop-2.8.3.tar.gz
mkdir hdfs
cd hdfs
mkdir data name tmp
目录结构信息
添加HADOOP环境变量到系统
/etc/profile
set hadoop
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
这里配置了 $HADOOP_HOME/sbin 是方便直接在命令窗口直接执行启动、停业集群的脚本,不用手动定位到sbin执行脚本。
添加JAVA_HOME到hadoop
在 /opt/hadoop/hadoop-2.8.3/etc/hadoop/hadoop-env.sh 末尾添加
export JAVA_HOME=/usr/local/java/jdk1.8.0_161
貌似 /etc/profile 配置了 export JAVA_HOME=/usr/local/java/jdk1.8.0_16, hadoop-env.sh默认直接引用里面的变量JAVA_HOME
# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}
配置集群情况
涉及core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、slaves五个配置文件,对应各个组件的配置。
位于 /opt/hadoop/hadoop-2.8.3/etc/hadoop/ 目录下。
文件 | 说明 |
---|---|
core-site.xml | Common组件 |
hdfs-site.xml | HDFS组件 |
mapred-site.xml | MapReduce组件 |
yarn-site.xml | YARN组件 |
slaves | slaves节点 |
core-site.xml
/opt/hadoop/hadoop-2.8.3/etc/hadoop/ core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
/opt/hadoop/hadoop-2.8.3/etc/hadoop/ 目录下,MapReduce组件的配置文件原来是 mapred-site.xml.template,把它复制或重命名为 mapred-site.xml。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 通知框架MapReduce使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
yarn.nodemanager.aux-services 的默认值为 mapreduce.shuffle ,hadoop 2.x配置了默认值,启动nodemanager会报错,从节点nodemanager服务不运行。
编辑slaves文件,添加从节点信息
去掉原本的localhost,换成以下内容。配置slaves的目录,是把所有节点连在一起,构成一个相连的集群,启动时,整个集群一起启动。
slave1
slave2
运行hadoop
先格式hadoop文件系统,再启动集群。
格式化namenode
hadoop namenode -format
格式化成功的话,会在命令窗口看到
common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted.
启动hadoop
这里可以忽略,往 start-all.sh下面看。
启动过程: namenode —> datanode —> HDFS—> YARN
到${HADOOP_HOME}/sbin目录下
namenode启动命令:
./hadoop-daemon.sh start namenode
datanode启动命令:
./hadoop-daemons.sh start datanode
HDFS启动命令
./start-dfs.sh
YARN启动命令:
./start-yarn.sh
这里我用start-all.sh启动集群所有服务的脚本,不用像上面那样一一启动。 启动过程如下。
启动集群:start-all.sh
[root@master sbin]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-slave1.out
停止集群命令 stop-all.sh
[root@master sbin]# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave1: stopping datanode
slave2: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave1: stopping nodemanager
slave2: stopping nodemanager
slave1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
我配置了 ${HADOOP_HOME}/sbin 可以直接系统全局窗口下使用sbin下所有的脚本。
各节点进程情况
- master
[root@master sbin]# jps
5793 Jps
5512 ResourceManager
5165 NameNode
5359 SecondaryNameNode
- slave1、slave2
[root@slave1 ~]# jps
1700 DataNode
1830 NodeManager
1961 Jps
[root@slave2 ~]# jps
1971 Jps
1798 NodeManager
1689 DataNode
logs目录
${HADOOP_HOME}/logs 目录存放相关日志信息,启动报错可以从这里查看日志,排查问题。
WEB UI界面
访问master WEB UI界面,可以看另外2个节点都正常运行。
更多推荐
所有评论(0)