Hadoop是Apache旗下的一个用java语言实现开源软件框架,是一个开发和运行处理大规模数据的软件平台。允许使用简单的编程模型在大量计算机集群上对大型数据集进行分布式处理。在搭建之前请一定要确保Hadoop集群搭建的前置准备已经完成,详细内容可以参考我的这篇文章。Hadoop安装包链接:https://pan.baidu.com/s/12R1q8ygEnosP9pVbX5rvxg 
提取码:LZZY

http://t.csdn.cn/FzkEShttp://t.csdn.cn/FzkES

 一、上传并解压Hadoop的压缩包

1、上传Hadoop的压缩包到我们的CentOS7系统上去,可以将压缩包直接拖入系统根目录,如图所示。(安装包在百度网盘,需要的小伙伴可以自行获取)。还可以通过我们的wgt命令去从官网下载,不过使用wgt命令下载的话,下载速度是非常慢的,使用小编还是建议大家采取第一种方案去获取我们的Hadoop的压缩包)使用wgt的完整命令在这里:wget http: /archive.apache.org/dist/hadoop/common/hadoop3.1.3/hadoop-3.1.3.tar.g

 2、然后我们就要解压我们的Hadoop压缩包,使用命令:tar -zxvf hadoop-3.1.3.tar.gz -C /export/server/ 将Hadoop压缩包解压至/export/server目录下。正常解压好后是如下图所示,但是有可能会有解压不了的情况,可能是在Hadoop压缩包上传的时候出现了问题。可以虚拟机上的Hadoop的压缩包,在重新上传即可。

3、还是跟之前安装JDK一样,为了后续操作的方便去给Hadoop创建一个软连接。

ln -s /export/server/hadoop-3.1.3 /export/server/hadoop

二、修改Hadoop的配置文件

1、首先进入Hadoop的文件夹中,

cd /export/server/hadoop/etc/Hadoop

可以看见其中有个叫etc的文件,他存放的就是我们的配置文件

 

 2、修改配置文件hadoop-env.sh

使用命令:vim hadoop-env.sh 在其开头加上下段代码。

# 配置Java安装路径
export JAVA_HOME=/export/server/jdk
#配置Hadoop安装路径
export HADOOP_HOME=/export/server/hadoop
#Hadoop hdfs配置文件路径
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
#Hadoop YARN配置文件路径
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
# Hadoop YARN 日志文件夹
export YARN_LOG_DIR=$HADOOP_HOME/1ogs/yarn
# Hadoop hdfs 日志文件夹
export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs

# Hadoop的使用启动用户配置
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export YARN_PROXYSERVER_USER=root

插入好后入下图所示,然后保存退出。(vim编辑器保存方式是在末行模式下点击esc键进入命令模式,然后输入:在输入wq这里表示推出,q表示保存。后面我会在单独出发布一篇vim编辑器的使用方法以及常用命令)

3、修改配置文件core-site.xml 将下面代码替代core-site.xml里的所有代码

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:8020</value>
        </property>

        <property>
                <name>io.file.buffer.size</name>
		<value>131072</value>
		<description></description>
        </property>
</configuration>

4、修改配置文件hdfs-site.xml将下面代码替代hdfs-site.xml里的所有代码

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->


<configuration>
	<property>
		<name>dfs.datanode.data.dir.perm</name>
		<value>700</value>
	</property>

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/data/nn</value>
		<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
	</property>

	<property>
		<name>dfs.namenode.hosts</name>
		<value>node1,node2,node3</value>
		<description>List of permitted DataNodes.</description>
	</property>

	<property>
		<name>dfs.blocksize</name>
		<value>268435456</value>
		<description></description>
	</property>

	<property>
		<name>dfs.namenode.handler.countL</name>
		<value>100</value>
		<description></description>
	</property>

	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/data/dn</value>
	</property>
</configuration>

 5、修改配置文件mapred-env.sh 将下面代码加入mapred-env.sh的开头

export JAVA_HOME=/export/server/jdk
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA

6 、修改配置文件mapred-site.xml将下面代码替换mapred-site.xml的所有代码

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0
(the "License");
you may not use this file except in compliance
with the License.
You may obtain a copy of the License at
http: /www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in
writing, software
distributed under the License is distributed on
an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied.
See the License for the specific language
governing permissions and
limitations under the License. See accompanying
LICENSE file.
-->
<!-- Put site-specific property overrides in this
file. >

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
		<description></description>
	</property>

	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>node1:10020</value>
		<description></description>
	</property>

	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>node1:19888</value>
		<description></description>
	</property>

	<property>
		<name>mapreduce.jobhistory.intermediate-done-dir</name>
		<value>/data/mr-history/tmp</value>
		<description></description>
	</property>

	<property>
		<name>mapreduce.jobhistory.done-dir</name>
		<value>/data/mr-history/done</value>
		<description></description>
	</property>

	<property>
		<name>yarn.app.mapreduce.am.env</name>
		<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
	</property>

	<property>
		<name>mapreduce.map.env</name>
		<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
	</property>

	<property>
		<name>mapreduce.reduce.env</name>
		<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
	</property>
</configuration>

 7、修改配置文件yarn-env.sh将下面代码替换yarn-env.sh的所有代码

export JAVA_HOME=/export/server/jdk
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs

 8、修改配置文件yarn-site.xmll将下面代码替换yarn-site.xml的所有代码

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0
(the "License");
you may not use this file except in compliance
with the License.
You may obtain a copy of the License at
http: /www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in
writing, software
distributed under the License is distributed on
an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. See the License for the specific language
governing permissions and
limitations under the License. See accompanying
LICENSE file.
-->
<configuration>

<!--- Site specific YARN configuration properties -->
	<property>
		<name>yarn.log.server.url</name>
		<value>http: /node1:19888/jobhistory/logs</value>
		<description></description>
	</property>

	<property>
		<name>yarn.web-proxy.address</name>
		<value>node1:8089</value>
		<description>proxy server hostname and port</description>
	</property>

	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
		<description>Configuration to enable or disable log aggregation</description>
	</property>

	<property>
		<name>yarn.nodemanager.remote-app-logdir</name>
		<value>/tmp/logs</value>
		<description>Configuration to enable or disable log aggregation</description>
	</property>
<!--- Site specific YARN configuration properties -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>node1</value>
		<description></description>
	</property>

	<property>
		<name>yarn.resourcemanager.scheduler.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
</value>
		<description></description>
	</property>

	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>/data/nm-local</value>
		<description>Comma-separated list of paths on the local filesystem where intermediate data is written.</description>
	</property>

	<property>
		<name>yarn.nodemanager.log-dirs</name>
		<value>/data/nm-log</value>
		<description>Comma-separated list of paths on the local filesystem where logs are written.</description>
	</property>

	<property>
		<name>yarn.nodemanager.log.retainseconds</name>
		<value>10800</value>
		<description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.</description>
	</property>

	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
		<description>Shuffle service that needs to be set for Map Reduce applications. 
</description>
	</property>
</configuration>

 9、在配置文件workers插入如下代码

node1
node2
node3

三、搭建Hadoop

1、将Hadoop分发给其他虚拟机,此过程只用在node1中操作即可。使用命令:cd /export/server进入我们Hadoop安装目录

分发给node2

scp -r hadoop-3.1.3 node2:`pwd`/ 

分发给node3

scp -r hadoop-3.1.3 node3:`pwd`/

2、分发好后,还是同样操作在node2,node3中创建Hadoop软链接

ln -s /export/server/hadoop-3.1.3 /export/server/hadoop

 3、创建工作目录。

在node1中分别创建以下目录

mkdir -p /data/nn
mkdir -p /data/dn
mkdir -p /data/nm-log 
mkdir -p /data/nm-local

 在node2中分别创建以下目录

mkdir -p /data/dn 
mkdir -p /data/nm-log 
mkdir -p /data/nm-local

  在node3中分别创建以下目录

mkdir -p /data/dn 
mkdir -p /data/nm-log 
mkdir -p /data/nm-local

 4、配置环境变量

在node1、node2、node3修改/etc/profile 将下面代码复制到/etc/profile文件最下面

export HADOOP_HOME=/export/server/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 注意,需要在三台虚拟机中都要执行此操作。保存退出后还要执行命令:source /etc/profile使其生效

 4、格式化NameNode

在node1中操作即可,使用命令: hadoop namenode -formathadoop 这个命令来自于:$HADOOP_HOME/bin中的程序 由于配置了环境变量PATH,所以可以在任意位置执行hadoop命令哦

 四、启动Hadoop集群

1、启动hadoop的hdfs集群,在node1执行即可

start-dfs.sh
 # 如需停止可以执行 
stop-dfs.sh

2、启动hadoop的yarn集群,在node1执行即可

start-yarn.sh
# 如需停止可以执行 
stop-yarn.sh

 3、启动历史服务器

mapred -daemon start historyserver 
# 停止命令
mapred -daemon stop historyserver

 注意:如果ips后发现没有启动历史服务器可以进入hadoop目录下的sbin输入如下命令(mr-jobhistory-daemon.sh start historyserver)

 4、启动web代理服务器

yarn-daemon.sh start proxyserver
#停止命令
yarn-daemon.sh stop proxyserver

 五、验证Hadoop集群搭建是否成功

1、 node1 node2 node3 上通过 jps 验证进程是否都启动成功

到这里我们的Hadoop集群就已经搭建完毕了,感谢您阅读我的博客!希望本文能够给您带来新的思考和启发。如果您对本文中的观点有任何疑问或者想要深入讨论的话题,请随时在评论区留言,我会尽力回答。期待与您共同探索更多有趣的话题,下次见!

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐