hadoop 2.8.3 集群环境搭建

hadoop 2.8.3集群环境搭建Linux ： CentOS 7hadoop版本： hadoop-2.8.3JDK： 1.8.0_161这里打算直接root用户，搭建3个节点的hadoop集群环境。先关闭防火墙，安装成功后，启动哪里报错再针对性开放端口。/etc/hosts192.168.247.129 master192.168.247.131 sla...

写代码的蓝胖子

3170人浏览 · 2018-04-22 16:48:51

写代码的蓝胖子 · 2018-04-22 16:48:51 发布

hadoop 2.8.3集群环境搭建

Linux ： CentOS 7
hadoop版本： hadoop-2.8.3
JDK： 1.8.0_161

这里打算直接root用户，搭建3个节点的hadoop集群环境。先关闭防火墙，安装成功后，启动哪里报错再针对性开放端口。

/etc/hosts

192.168.247.129 master
192.168.247.131 slave1
192.168.247.132 slave2

配置JDK环境

hadoop-2.8.3 需要JDK1.8的环境，这里使用的是 JDK 1.8.0_161，三个系统都使用同一个版本的JDK.。配置过程略。

配置ssh免密码登录

详细可以参考上一篇文章 CentOS 7 配置ssh免密码登录

分别在三台机器执行执行以下命令

生成公钥、密码

cd /root/.ssh/
ssh-keygen -t rsa

复制公钥到其它机器

实际上把生成公钥id_rsa.pub 追加到其它机器上的 authorized_keys 末尾。

ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2

验证

看看首次输入密码，exit后，看看再次登录到其它机器用不用密码。
确认三台机器之间两两登录都不需要密码

ssh master
ssh slave1
ssh slave2

如果ssh公钥不生效，尝试保证下面两个条件：1、ssh目录的权限必须是700 2、 .ssh/authorized_keys文件权限必须是600

配置hadoop

这里，我打算把hadoop程序包、及数据文件夹放到 /opt 下，在 /opt目录下创建hadoop文件夹。把安装上传(下载)到这里来，解压 hadoop-2.8.3.tar.gz。在 /opt/hadoop目录下，创建数据目录 hdfs，然后在hdfs目录下，分别创建 data、name、tmp三个目录，用于存放hadoop 文件系统的数据。

 cd /opt
 mkdir hadoop
 cd hadoop
 tar -xzvf hadoop-2.8.3.tar.gz

 mkdir hdfs
 cd hdfs
 mkdir data name tmp

目录结构信息

添加HADOOP环境变量到系统

/etc/profile

set hadoop
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

这里配置了 $HADOOP_HOME/sbin 是方便直接在命令窗口直接执行启动、停业集群的脚本，不用手动定位到sbin执行脚本。

添加JAVA_HOME到hadoop

在 /opt/hadoop/hadoop-2.8.3/etc/hadoop/hadoop-env.sh 末尾添加

export  JAVA_HOME=/usr/local/java/jdk1.8.0_161

貌似 /etc/profile 配置了 export JAVA_HOME=/usr/local/java/jdk1.8.0_16， hadoop-env.sh默认直接引用里面的变量JAVA_HOME

# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}

配置集群情况

涉及core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、slaves五个配置文件，对应各个组件的配置。
位于 /opt/hadoop/hadoop-2.8.3/etc/hadoop/ 目录下。

文件	说明
core-site.xml	Common组件
hdfs-site.xml	HDFS组件
mapred-site.xml	MapReduce组件
yarn-site.xml	YARN组件
slaves	slaves节点

core-site.xml

/opt/hadoop/hadoop-2.8.3/etc/hadoop/ core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/opt/hadoop/hdfs/tmp</value>
                <description>A base for other temporary directories.</description>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
                <name>hadoop.proxyuser.root.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.root.groups</name>
                <value>*</value>
        </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/opt/hadoop/hdfs/name</value>
                <final>true</final>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/opt/hadoop/hdfs/data</value>
                <final>true</final>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>master:9001</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>

mapred-site.xml

/opt/hadoop/hadoop-2.8.3/etc/hadoop/ 目录下，MapReduce组件的配置文件原来是 mapred-site.xml.template，把它复制或重命名为 mapred-site.xml。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 通知框架MapReduce使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<configuration>

        <!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>master:18040</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>master:18030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>master:18088</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>master:18025</value>
        </property>
        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>master:18141</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

yarn.nodemanager.aux-services 的默认值为 mapreduce.shuffle ，hadoop 2.x配置了默认值，启动nodemanager会报错，从节点nodemanager服务不运行。

编辑slaves文件，添加从节点信息

去掉原本的localhost，换成以下内容。配置slaves的目录，是把所有节点连在一起，构成一个相连的集群，启动时，整个集群一起启动。

slave1
slave2

运行hadoop

先格式hadoop文件系统，再启动集群。

格式化namenode

hadoop namenode -format

格式化成功的话，会在命令窗口看到

 common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted.

启动hadoop

这里可以忽略，往 start-all.sh下面看。
启动过程： namenode —> datanode —> HDFS—> YARN
到${HADOOP_HOME}/sbin目录下

namenode启动命令：
./hadoop-daemon.sh start namenode

datanode启动命令：
./hadoop-daemons.sh start datanode

HDFS启动命令
./start-dfs.sh

YARN启动命令:
./start-yarn.sh

这里我用start-all.sh启动集群所有服务的脚本，不用像上面那样一一启动。启动过程如下。
启动集群：start-all.sh

[root@master sbin]# start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-slave1.out

停止集群命令 stop-all.sh

[root@master sbin]# stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave1: stopping datanode
slave2: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave1: stopping nodemanager
slave2: stopping nodemanager
slave1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

我配置了 ${HADOOP_HOME}/sbin 可以直接系统全局窗口下使用sbin下所有的脚本。

各节点进程情况

master

[root@master sbin]# jps
5793 Jps
5512 ResourceManager
5165 NameNode
5359 SecondaryNameNode

slave1、slave2

[root@slave1 ~]# jps
1700 DataNode
1830 NodeManager
1961 Jps

[root@slave2 ~]# jps
1971 Jps
1798 NodeManager
1689 DataNode

logs目录

${HADOOP_HOME}/logs 目录存放相关日志信息，启动报错可以从这里查看日志，排查问题。

WEB UI界面

访问master WEB UI界面，可以看另外2个节点都正常运行。

http://master:50070/

Datanode Information

http://master:18088/cluster/nodes

cluster/nodes

Linux

更多推荐

网卡速率和双工模式的配置

http://linux.chinaitlab.com/system/792187.html1、mii-tool 配置网络设备协商方式的工具； 1.1 mii-tool 介绍； mii-tool - view, manipulate media-independent interface status （mii-tool 是查看，管理介质的网络接口的状态）

Linux

Linux虚拟文件系统之文件系统卸载（sys_umount())

Linux中卸载文件系统由umount系统调用实现，入口函数为sys_umount()。较于文件系统的安装较为简单，下面是具体的实现。1. /*sys_umont系统调用*/2. SYSCALL_DEFINE2(umount, char __user *, name, int, flags)3. {4.struct path path;

Linux

Linux系统下超级终端Minicom的使用方法（例如：连接交换机，路由器等）转http://baike.baidu.com/view/2911642.htm?fr=ala0_1

Linux系统下超级终端Minicom的使用方法 　　Linux下的Minicom的功能与下的超级终端功能相似，适于在通过超级终端对设备的管理以及对嵌入操作系统的升级，现写出Minicom的使用手册： 　　1．启动minicom 　　以root权限登录系统 　　使用命令 　　minicom –s 则minicom启动，屏