Hadoop--基础知识点--4--hadoop集群-docker搭建
此环境只能用于学习,用的是弱口令:0000001 集群部署结构2 提前安装好docker/docker-compose环境3 部署3.1 部署脚本目录将hadoop_docker上传至linux服务器,此处用的centos7,脚本里一些安装命令是cetos环境的。部署时最好使用同版本hadoop-3.1.3,因为有个配置与该版本以后的版本不同,防止后续使用出错。3.2 hadoop_docker目
[注意]
- 此环境只能用于学习,用的是弱口令:000000;而且开放很多端口不安全。
- 搭建环境建议使用最低12g内存机器;测试时使用的是 4核cpu,4g内存环境集群启动不起来,解决办法是创建8g虚拟内存,创建虚拟内存链接:创建虚拟内存
1 集群部署结构
2 环境
- 提前安装好docker/docker-compose。
- 安装好nginx,主要是为了通过本地浏览器访问网页,因为有集群有3个节点,需要做反向代理,配置文件
nginx.conf
在下文给出。
3 在宿主机上防火墙开启以下端口
映射端口 | hadoop102 | hadoop103 | hadoop104 |
---|---|---|---|
- | 20022 | 30022 | 40022 |
8042 | 28042 | 38042 | 48042 |
8088 | 28088 | 38088 | 48088 |
9864 | 29864 | 39864 | 49864 |
9868 | 29868 | 39868 | 49868 |
9870 | 29870 | 39870 | 49870 |
19888 | 19888 | - | - |
宿主机需要一共打开23
个端口;还有一个80
端口,用于测试nginx是否启动成功。
4 部署
4.1 部署脚本目录
将hadoop_docker上传至linux服务器,此处用的centos7,脚本里一些安装命令是cetos环境的。部署时最好使用同版本hadoop-3.1.3,因为有个配置与该版本以后的版本不同,防止后续使用出错。
4.2 hadoop_docker目录下的文件
[1]hadoop_docker/config-default
hadoop_docker/config-default
目录下的内容是hadoop-3.1.3.tar.gz解压缩后hadoop-3.1.3/etc/hadoop
目录下的内容,完全一样。
[2] hadoop_docker/config-site
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定 Namenode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.1.3/data</value>
<description>A base for other temporary directories.</description>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 root -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
</configuration>
hadoop-env.sh
修改如下几个环境变量
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop102:9870</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:9868</value>
<description>
The secondary namenode http server address and port.
</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
</property>
</configuration>
works
hadoop102
hadoop103
hadoop104
yarn-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<description>A comma separated list of services where service name should only
contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<property>
<description>Environment variables that containers may override rather than use NodeManager's default.</description>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</configuration>
[3] docker-compose.yaml
version: '3'
services:
hadoop102:
image: hadoop:v1
ports:
- "19888:19888"
- "20022:22"
- "28042:8042"
- "28088:8088"
- "29864:9864"
- "29868:9868"
- "29870:9870"
expose:
- 8020
- 10020
privileged: true
volumes:
- /opt/hadoop/hadoop102/data:/opt/hadoop-3.1.3/data:rw
- /opt/hadoop/hadoop102/etc/hadoop:/opt/hadoop-3.1.3/etc/hadoop:rw
container_name: hadoop102
hostname: hadoop102
networks:
mynet:
ipv4_address: 172.16.21.102
command: /usr/sbin/init
restart: always
hadoop103:
image: hadoop:v1
ports:
- "30022:22"
- "38042:8042"
- "38088:8088"
- "39864:9864"
- "39868:9868"
- "39870:9870"
expose:
- 8020
- 10020
privileged: true
volumes:
- /opt/hadoop/hadoop103/data:/opt/hadoop-3.1.3/data:rw
- /opt/hadoop/hadoop103/etc/hadoop:/opt/hadoop-3.1.3/etc/hadoop:rw
container_name: hadoop103
hostname: hadoop103
networks:
mynet:
ipv4_address: 172.16.21.103
command: /usr/sbin/init
restart: always
hadoop104:
image: hadoop:v1
ports:
- "40022:22"
- "48042:8042"
- "48088:8088"
- "49864:9864"
- "49868:9868"
- "49870:9870"
expose:
- 8020
- 10020
privileged: true
volumes:
- /opt/hadoop/hadoop104/data:/opt/hadoop-3.1.3/data:rw
- /opt/hadoop/hadoop104/etc/hadoop:/opt/hadoop-3.1.3/etc/hadoop:rw
container_name: hadoop104
hostname: hadoop104
networks:
mynet:
ipv4_address: 172.16.21.104
command: /usr/sbin/init
restart: always
networks:
mynet:
driver: bridge
ipam:
driver: default
config:
-
subnet: 172.16.21.0/24
gateway: 172.16.21.1
[4] Dockerfile
FROM centos:7
MAINTAINER ChasingDreams
WORKDIR /opt
USER root
COPY xsync myhadoop jpsall ./
COPY jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz hadoop_image.sh ./
RUN chmod +x hadoop_image.sh && ./hadoop_image.sh && rm hadoop_image.sh -rf
CMD /bin/bash
[5] hadoop-image.sh
#! /bin/bash
# 1 解压jdk
tar -zxf jdk-8u212-linux-x64.tar.gz
# 2 解压hadoop
tar -zxf hadoop-3.1.3.tar.gz
# 3 配置jdk|hadoop环境变量
cat >> /etc/profile.d/my_env.sh << EOF
# JAVA_HOME
export JAVA_HOME=/opt/jdk1.8.0_212
export PATH=\$PATH:\$JAVA_HOME/bin
# HADOOP_HOME
export HADOOP_HOME=/opt/hadoop-3.1.3
export PATH=\$PATH:\$HADOOP_HOME/bin
export PATH=\$PATH:\$HADOOP_HOME/sbin
EOF
# 4 删除jdk|hadoop压缩包
rm jdk-8u212-linux-x64.tar.gz -rf
rm hadoop-3.1.3.tar.gz -rf
rm config-site.tar.gz -rf
rm config-site -rf
# 5 安装rsync
yum -y install rsync
systemctl enable rsyncd.service
# 6 修改 xsync|myhadoop|jpsall 权限并放到bin目录下
chmod +x /opt/xsync /opt/myhadoop /opt/jpsall
mkdir /root/bin
mv /opt/xsync /root/bin/
mv /opt/myhadoop /root/bin/
mv /opt/jpsall /root/bin/
# 7 安装openssh-server
yum install -y openssl openssh-server openssh-clients
systemctl enable sshd.service
sed -i '/^#PermitRootLogin yes$/cPermitRootLogin yes' /etc/ssh/sshd_config
sed -i '/^UsePAM yes$/cUsePAM no' /etc/ssh/sshd_config
sed -i '/^#PubkeyAuthentication yes$/cPubkeyAuthentication yes' /etc/ssh/sshd_config
# 8 修改密码
echo 000000 | passwd --stdin root
# 9 修改系统语言环境
echo "export LC_ALL=en_US.utf8" >> /etc/profile
echo "export LANG=en_US.utf8" >> /etc/profile
# 10 修改系统时区
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[6] image-container.sh
#! /bin/bash
# 0 删除上次启动的集群相关的数据及配置
if [ -d /opt/hadoop ]; then
rm /opt/hadoop/* -rf
fi
# 1 build hadoop image
docker build -t hadoop:v1 .
echo "========= Building image successfully!!! ========="
# 2 宿主机上的映射目录,配置文件目录
sed -i 's/\r$//g' config-site/workers # 修改为unix格式文件,避免是dos格式文件,导致DataNode启动不起来
hadoop_cluster_dir=/opt/hadoop
for hadoop in hadoop102 hadoop103 hadoop104
do
mkdir -p ${hadoop_cluster_dir}/${hadoop}/etc/hadoop
\cp -rf config-default/* ${hadoop_cluster_dir}/${hadoop}/etc/hadoop
\cp -f config-site/* ${hadoop_cluster_dir}/${hadoop}/etc/hadoop
done
echo "========= Configuration copy complete!!! ========="
# 3 deploy hadoop cluster
docker-compose -f ./docker-compose.yml up -d
echo "========= Starting cluster successfully!!! ========="
# 4 hadoop102 | hadoop103 | hadoop104 之间免密登录
expect_pkg_name=$(rpm -qa | grep expect)
if [ ! ${expect_pkg_name} ]; then
yum install -y expect
fi
hadoop102=172.16.21.102
hadoop103=172.16.21.103
hadoop104=172.16.21.104
for hadoop in ${hadoop102} ${hadoop103} ${hadoop104}
do
sed -i "/${hadoop}/d" /root/.ssh/known_hosts
done
password=000000
for hadoop in ${hadoop102} ${hadoop103} ${hadoop104}
do
expect <<-EOF
send_user "=============== ${hadoop} generate pri-pub key: start ===============\n"
spawn ssh root@${hadoop} ssh-keygen -t rsa
expect {
"(yes/no)?" {send "yes\n";exp_continue}
"password:" {send "${password}\n"}
}
expect "(/root/.ssh/id_rsa):"
send "\n"
expect "passphrase):"
send "\n"
expect "again:"
send "\n"
expect eof
send_user "=============== ${hadoop} generate pri-pub key: end ===============\n"
EOF
done
for hadoop in hadoop102 hadoop103 hadoop104
do
echo "=============== Copying ${hadoop} pri-pub key: start ==============="
docker cp ${hadoop}:/root/.ssh/id_rsa.pub ./
cat id_rsa.pub >> authorized_keys
rm id_rsa.pub -f
echo "=============== Copying ${hadoop} pri-pub key: end ==============="
done
for hadoop in hadoop102 hadoop103 hadoop104
do
echo "=============== Copying authorized_keys to ${hadoop}: start ==============="
docker cp authorized_keys ${hadoop}:/root/.ssh/
echo "=============== Copying authorized_keys to ${hadoop}: end ==============="
done
rm authorized_keys -f
echo "=============== Interconnection between containers: start ==============="
for hadoop1 in hadoop102 hadoop103 hadoop104
do
for hadoop2 in hadoop102 hadoop103 hadoop104
do
if [ ${hadoop1} != ${hadoop2} ]; then
expect <<-EOF
spawn docker exec -it ${hadoop1} ssh root@${hadoop2}
expect "(yes/no)?"
send "yes\n"
set timeout 1
expect eof
EOF
fi
done
done
echo "=============== Interconnection between containers: end ==============="
[7] jpsall
#! /bin/bash
for host in hadoop102 hadoop103 hadoop104
do
echo "======================== ${host} ========================"
ssh root@${host} jps
done
[8] myhadoop
#! /bin/bash
if [ $# -lt 1 ]; then
echo "No Args Input..."
exit;
fi
case ${1} in
"start")
echo "================ 启动 hadoop 集群 ================"
echo "---------------- 启动 hdfs ----------------"
ssh root@hadoop102 "/opt/hadoop-3.1.3/sbin/start-dfs.sh"
echo "---------------- 启动 yarn ----------------"
ssh root@hadoop103 "/opt/hadoop-3.1.3/sbin/start-yarn.sh"
echo "---------------- 启动 historyserver ----------------"
ssh root@hadoop102 "/opt/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo "================ 关闭 hadoop 集群 ================"
echo "---------------- 关闭 historyserver ----------------"
ssh root@hadoop102 "/opt/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
echo "---------------- 关闭 yarn ----------------"
ssh root@hadoop103 "/opt/hadoop-3.1.3/sbin/stop-yarn.sh"
echo "---------------- 关闭 hdfs ----------------"
ssh root@hadoop102 "/opt/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
[9] xsync
#! /bin/bash
# 1 判断参数个数
if [ $# -lt 1 ];then
echo Not Enough Argument!
exit
fi
# 2 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
# 3 遍历所有目录,挨个发送
cur_hostname=$(cat /etc/hostname)
if [ ${cur_hostname} != ${host} ]; then
echo ================= $host =================
for file in $@
do
# 4 判断文件是否存在
if [ -e $file ];then
# 5 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
# 6 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file doces not exists!
fi
done
fi
done
[8] jdk 与 hadoop下载相应版本即可
5 部署
cd hadoop_docker
./image_container.sh
6 启动集群
[1] hadoop102
[root@hadoop102 ~]# hdfs namenode -format
[root@hadoop102 ~]# cd /opt/hadoop-3.1.3/
[root@hadoop102 hadoop-3.1.3]# sbin/start-dfs.sh
[2] hadoop103
[root@hadoop103 ~]# cd /opt/hadoop-3.1.3/
[root@hadoop103 hadoop-3.1.3]# sbin/start-yarn.sh
7 测试集群是否启动成功
7.1 使用jps命令查看
使用jps命令查看各个进程启动情况,启动进程与上表保持一致即表示集群搭建成功。
[1]hadoop102
[root@hadoop102 hadoop-3.1.3]# jps
1618 Jps
842 NameNode
1436 NodeManager
1021 DataNode
[2]hadoop103
[root@hadoop103 hadoop-3.1.3]# jps
1184 Jps
275 DataNode
826 NodeManager
669 ResourceManager
[3]hadoop104
[root@hadoop104 ~]# jps
368 SecondaryNameNode
262 DataNode
509 NodeManager
4607 Jps
7.2 上传文件测试
8 网页访问配置
8.1 配置nginx
nginx.conf
#user nobody;
worker_processes 1;
#error_log logs/error.log;
#error_log logs/error.log notice;
#error_log logs/error.log info;
#pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
#log_format main '$remote_addr - $remote_user [$time_local] "$request" '
# '$status $body_bytes_sent "$http_referer" '
# '"$http_user_agent" "$http_x_forwarded_for"';
#access_log logs/access.log main;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
server {
listen 80;
server_name localhost;
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
root html;
index index.html index.htm;
}
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
server {
listen 8042;
server_name hadoop102 hadoop103 hadoop104;
location / {
if ($host = hadoop102) {
proxy_pass http://127.0.0.1:28042;
}
if ($host = hadoop103) {
proxy_pass http://127.0.0.1:38042;
}
if ($host = hadoop104) {
proxy_pass http://127.0.0.1:48042;
}
}
}
server {
listen 8088;
server_name hadoop102 hadoop103 hadoop104;
location / {
if ($host = hadoop102) {
proxy_pass http://127.0.0.1:28088;
}
if ($host = hadoop103) {
proxy_pass http://127.0.0.1:38088;
}
if ($host = hadoop104) {
proxy_pass http://127.0.0.1:48088;
}
}
}
server {
listen 9864;
server_name hadoop102 hadoop103 hadoop104;
location / {
if ($host = hadoop102) {
proxy_pass http://127.0.0.1:29864;
}
if ($host = hadoop103) {
proxy_pass http://127.0.0.1:39864;
}
if ($host = hadoop104) {
proxy_pass http://127.0.0.1:49864;
}
}
}
server {
listen 9868;
server_name hadoop102 hadoop103 hadoop104;
location / {
if ($host = hadoop102) {
proxy_pass http://127.0.0.1:29868;
}
if ($host = hadoop103) {
proxy_pass http://127.0.0.1:39868;
}
if ($host = hadoop104) {
proxy_pass http://127.0.0.1:49868;
}
}
}
server {
listen 9870;
server_name hadoop102 hadoop103 hadoop104;
location / {
if ($host = hadoop102) {
proxy_pass http://127.0.0.1:29870;
}
if ($host = hadoop103) {
proxy_pass http://127.0.0.1:39870;
}
if ($host = hadoop104) {
proxy_pass http://127.0.0.1:49870;
}
}
}
}
将该文件直接替换掉服务器中的nginx.conf的配置文件
8.2 启动nginx
进入到/usr/local/nginx/sbin
目录,执行:
./nginx
8.3 测试nginx是否启动成功
在本地浏览器访问 http://[ip of server]:80
判断是否nginx启动成功。成功返回以下页面。
8.4 配置本地hosts文件
在C:\Windows\System32\drivers\etc、hosts文件中末尾加入:
ip_of_server hadoop102
ip_of_server hadoop103
ip_of_server hadoop104
ip_of_server
:远程服务器ip
,三个ip_of_server
是一样的,因为三个节点都在该远程服务器上。
8.5 测试是否可以访问hadoop集群
在本地浏览器中访问:http://hadoop102:9870
,成功返回以下页面:
在本地浏览器中访问:http://hadoop103:8088
9 历史服务器进程
9.1 启动历史服务器进程
[root@hadoop102 ~]# cd /opt/hadoop-3.1.3/
[root@hadoop102 hadoop-3.1.3]# bin/mapred --daemon start historyserver
9.2 jps查看是否启动成功
[root@hadoop102 hadoop-3.1.3]# jps
2048 JobHistoryServer
1618 Jps
842 NameNode
1436 NodeManager
1021 DataNode
9.3 网页查看测试
[1] 分词测试
[root@hadoop102 ~]# cd /opt/hadoop-3.1.3/
[root@hadoop102 hadoop-3.1.3]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input/word.txt /output
[2] 点击测试
9.4 查看日志聚集功能
10 测试脚本 myhadoop| jpsall
10.1 使用 myhadoop 启动停止集群
不论 hadoop102|hadoop103|hadoop104 中的哪个节点都可以执行以下命令:
myhadoop start
myhadoop stop
10.2 使用 jpsall 查看集群各个服务的启动状态
不论 hadoop102|hadoop103|hadoop104 中的哪个节点都可以执行以下命令:
jpsall
更多推荐
所有评论(0)