配置静态IP

执行vi /etc/sysconfig/network-scripts/ifcfg-eth0,注:每个节点IP不一样

DEVICE=eth0
HWADDR=00:0C:29:B4:3F:A2
TYPE=Ethernet
UUID=16bdaf21-574b-4e55-87fd-12797bc7da5c
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.1.201
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DNS=192.168.1.1

配置Host

执行vi /etc/hosts

192.168.1.201 hadoop-cs198201
192.168.1.202 hadoop-cs198202
192.168.1.203 hadoop-cs198203
192.168.1.204 hadoop-cs198204

如果需要上网,需要设置nameserver,执行vi /etc/resolv.conf,添加nameserver 192.168.1.1然后保存

删除centos自带JDK

centos有自带jdk,安装自己的jdk之前需要先将自带的jdk卸载,执行如下命令:

1. rpm -qa|grep jdk
   java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
   java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
2. 删除安装包
   yum -y remove java java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
   yum -y remove java java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64

安装JDK

将下载的jdk解压到安装目录下,并配置环境变量如下

export JAVA_HOME=/export/servers/jdk1.7.0_79
export JAVA_BIN=/export/servers/jdk1.7.0_79/bin
export HADOOP_HOME=/export/servers/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export HADOOP_MAPRED_HOME=$HADOOP_HOME 
export HADOOP_YARN_HOME=$HADOOP_HOME 
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib:$PATH

安装完成后执行source /etc/profile使文件生效,然后执行java -version校验java是否安装成功。

Zookeeper安装

网上教程很多,基本都正确,如果安装后启动有超时问题,很有可能是防火墙没有关闭,关闭后重启机器即可。

Hadoop安装

下载hadoop2.2安装包,地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.2.0/,下载后解压到安装目录。

修改配置文件

  • core-site.xml
<configuration>

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/export/data/data0/hadoop_tmp</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>hadoop-cs198201:2181,hadoop-cs198202:2181,hadoop-cs198203:2181</value>
    </property>
    <property>
        <name>ha.zookeeper.parent-znode</name>
        <value>/hadoop-ha</value>
    </property>
    <property>
        <name>ha.zookeeper.session-timeout.ms</name>
        <value>5000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>io.native.lib.available</name>
        <value>true</value>
        <description>hadoop.native.lib is deprecated</description>
    </property>
    <property>  
    <name>hadoop.http.staticuser.user</name>  
    <value>hadoop</value>  
    </property> 
    <property>
        <name>hadoop.security.authorization</name>
        <value>true</value>
    </property>
    <property>
          <name>hadoop.security.authentication</name>
          <value>simple</value>
    </property>
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
    <property> 
        <name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name> 
        <value>160000</value> 
    </property> 
    <property> 
        <name>ha.failover-controller.new-active.rpc-timeout.ms</name> 
        <value>360000</value> 
    </property> 
    <!-- OOZIE -->
    <property>
        <name>hadoop.proxyuser.oozie.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.oozie.groups</name>
        <value>*</value>
    </property>
    <!-- hive -->
    <property>
        <name>hadoop.proxyuser.hive.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hive.groups</name>
        <value>*</value>
    </property>
</configuration>
  • hdfs-site.xml
<configuration>
<property>
  <name>dfs.nameservices</name>
  <value>ns1</value>
</property>
<property>
  <name>dfs.ha.namenodes.ns1</name>
  <value>nn1,nn2</value> 
</property>
<property>
  <name>dfs.namenode.rpc-address.ns1.nn1</name>
  <value>hadoop-cs198201:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.ns1.nn2</name>
  <value>hadoop-cs198202:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.ns1.nn1</name>
  <value>hadoop-cs198201:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.ns1.nn2</name>
  <value>hadoop-cs198202:50070</value>
</property>
<property>
       <name>dfs.namenode.servicerpc-address.ns1.nn1</name>
       <value>hadoop-cs198201:53310</value>
    </property>
    <property>
       <name>dfs.namenode.servicerpc-address.ns1.nn2</name>
       <value>hadoop-cs198202:53310</value>
    </property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://hadoop-cs198201:8485;hadoop-cs198202:8485;hadoop-cs198203:8485/ns1</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/export/data/data0/journal/data</value>
</property>
<property>
  <name>dfs.qjournal.write-txns.timeout.ms</name>
  <value>120000</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.ns1</name> 
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.connect-timeout</name>
  <value>30000</value>
</property>
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>/export/data/data0/nn</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>/export/data/data0/dfs</value>
</property>
<property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
</property>
<property>
    <name>dfs.namenode.servicerpc-address.ns1.nn1</name>
    <value>hadoop-cs198201:8021</value>
</property>
<property>
    <name>dfs.namenode.servicerpc-address.ns1.nn2</name>
    <value>hadoop-cs198202:8021</value>
</property>
<property> 
    <name>dfs.client.socket-timeout</name> 
    <value>180000</value> 
</property> 
<property>
    <name>dfs.permissions.enable</name>
    <value>false</value>
</property>
<property>
   <name>dfs.permissions</name>
   <value>false</value>
</property>
</configuration>
  • mapreduce-site.xml
<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
<property>
  <name>mapreduce.task.io.sort.factor</name>
  <value>10</value>
</property>
<property> 
  <name>mapreduce.reduce.shuffle.parallelcopies</name>
  <value>5</value>
</property>
<property>
   <name>mapreduce.jobhistory.address</name>
   <value>hadoop-cs198203:10020</value>
</property>
<property>
   <name>mapreduce.jobhistory.webapp.address</name>
   <value>hadoop-cs198203:19888</value>
</property>
</configuration>
  • yarn-site.xml
<configuration>
<property>
   <name>yarn.acl.enable</name>
   <value>true</value>
</property>
<property>
   <name>yarn.admin.acl</name>
   <value>hadoop</value>
</property>
<property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
</property>
<property>
   <name>yarn.resourcemanager.address</name>
   <value>hadoop-cs198201:8032</value>
</property>
<property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>hadoop-cs198201:8030</value>
</property>
<property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>hadoop-cs198201:8031</value>
</property>
<property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>hadoop-cs198201:8033</value>
</property>
<property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>hadoop-cs198201:8088</value>
</property>
<property>
   <name>yarn.resourcemanager.scheduler.class</name>
   <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
   <name>yarn.scheduler.fair.allocation.file</name>
   <value>/export/servers/hadoop-2.2.0/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
   <name>yarn.scheduler.assignmultiple</name>
   <value>true</value>
</property>
<property>
  <name>yarn.scheduler.fair.allow-undeclared-pools</name>
  <value>false</value>
</property>
<property>
   <name>yarn.scheduler.fair.locality.threshold.node</name>
   <value>0.1</value>
</property>
<property>
   <name>yarn.scheduler.fair.locality.threshold.rack</name>
   <value>0.1</value>
</property>     
<property>
   <name>yarn.nodemanager.pmem-check-enabled</name>
   <value>true</value>
</property>
<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
   <value>true</value>
</property>
<property>
   <name>yarn.nodemanager.local-dirs</name>
   <value>/export/data/data0/yarn/local,/export/data/data1/yarn/local,/export/data/data2/yarn/local</value>
</property>
<property>
   <name>yarn.log-aggregation-enable</name>
   <value>true</value>
</property>
<property>
   <name>yarn.nodemanager.log-dirs</name>
   <value>/export/data/data0/yarn/logs</value>
</property>
<property>
   <name>yarn.nodemanager.log.retain-seconds</name>
   <value>86400</value>
</property>
<property>
   <name>yarn.nodemanager.remote-app-log-dir</name>
   <value>/export/tmp/app-logs</value>
</property>
<property>
   <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
   <value>logs</value>
</property>
<property>
   <name>yarn.log-aggregation.retain-seconds</name>
   <value>259200</value>
</property>
<property>
   <name>yarn.log-aggregation.retain-check-interval-seconds</name>
   <value>86400</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
</configuration>

集群启动

1. 启动zookeeper:zkServer.sh start,一般安装台数为基数
2. 对zookeeper集群进行格式化,在其中一个namenode执行即可:hdfs zkfc -formatZK
3. 启动JournalNode进程:hadoop-daemon.sh  start journalnode,在zk节点执行
4. 格式化hadoop集群:hdfs  namenode -format ns1
5. 启动namenode:hadoop-daemon.sh start namenode
6. 在另外一个namenode上执行如下命令:
   hdfs namenode -bootstrapStandby
   hadoop-daemon.sh start namenode
7. 在namenode上执行如下命令来启动所有进程:
   start-dfs.sh
   start-yarn.sh
8. 启动后可以执行如下命令查看当前namenode是active还是standby状态
   hdfs haadmin -getServiceState nn1
   standby
   hdfs haadmin -getServiceState nn2
   active
9. 启动historyserver:sh mr-jobhistory-daemon.sh start historyserver

在安装过程中遇到几个问题就是:
1. 启动hadoop命令时报内存错误,原因可能是hadoop-env.sh或者yarn-env.sh文件中设置的内存过大,进行调整或者使用默认值即可。
2. 启动start-dfs.sh命令时报hostname无法识别错误,原因是没有设置hadoop环境变量,添加进去即可。
3. 环境变量更新后需要重新重新启动集群,否则可能出现两个namenode均是standby的情况。

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐