Hadoop实验报告

目录

一、在Linux上安装并运行Hadoop

实验目的、要求及环境

  • 实验目的:在Linux上安装并运行Hadoop

  • 实验要求:在Linux上正确安装、配置Hadoop,并能够成功运行Hadoop,再根据指示在HDFS进行相应的操作

  • 实验环境

    • VMware Workstation 16 Pro

    • 装有Ubuntu14.04操作系统的VMware虚拟机

实验步骤

第一步:下载Hadoop压缩包

利用以下指令下载Hadoop压缩包,

wget https://mirrors.bit.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
第二步:安装OpenJDK

利用以下指令安装OpenJDK,

sudo apt-get install openjdk-8-jdk
第三步:解压Hadoop压缩包并修改配置文件

首先使用以下指令解压Hadoop压缩包,并进入Hadoop-2.10.1文件夹,

tar xf hadoop-2.10.1.tar.gz
cd hadoop-2.10.1

然后使用以下三部分指令,并根据实验教程修改相应的Hadoop配置文件,以配置Java路径、FS等信息,

vim etc/hadoop/hadoop-env.sh
vim etc/hadoop/core-site.xml
vim etc/hadoop/hdfs-site.xml
第四步:安装OpenSSH-server并配置无密码信任关系

因为在启动hadoop的时候,脚本文件需要远程登录(实验中是单机伪分布式),所以需要使用以下操作来配置SSH无密码信任关系:

  1. 使用以下指令安装Openssh-server,
sudo apt-get install openssh-server
  1. 使用以下指令生成公钥和私钥,
ssh-keygen -t dsa
  1. 使用以下指令将公钥输入到认证文件,
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
  1. 根据Apache官方文档,将authorized_keys的权限从664修改为600
chmod 0600 ~/.ssh/authorized_keys
第五步:格式化HDFS

在Hadoop主目录下使用以下指令格式化HDFS,

bin/hdfs namenode -format
第六步:启动NameNode Daemon和DataNode Deamon

执行以下指令开启分布式文件系统,即NameNode Deamon和DataNode Deamon,

sbin/start-dfs.sh

使用jps指令查看NameNode和DataNode信息如图1.1所示。

JPS查看NameNode和DataNode

图 1.1 通过JPS指令查看NameNode和DataNode

第七步:在HDFS中尝试任意操作

使用以下指令在HDFS中创建一个名为“学号-姓名”的文件夹,并查看当前目录,

bin/hdfs dfs -mkdir /212050-huyu
bin/hdfs dfs -ls /

图1.2说明了HDFS已可成功使用。

在HDFS中尝试任意操作

图 1.2 在HDFS中尝试任意操作

第八步:尝试将本地磁盘中的纯文本文件复制到HDFS

在本地磁盘中创建miao.txt并写入实验教程给定的内容,随后使用以下指令将该文件复制到HDFS中,

bin/hdfs dfs -put miao.txt /miao/

图1.3显示了miao.txt已成功复制到HDFS。

将本地磁盘中的maio.txt复制到HDFS并查看

图 1.3 将本地磁盘中的maio.txt复制到HDFS并查看

问题及解决方案

  1. 问题描述:安装Java时提示“E: Unable to locate package openjdk-8-jdk”。

    解决方案:执行以下两条指令:

    sudo add-apt-repository ppa:openjdk-r/ppa
    sudo apt-get update
    
  2. 问题描述:启动NameNode Daemon和DataNode Daemon提示“Incorrect configuration: namenode address dis,namenode.ervicerpc-address or dfs.namenode.rpc-address in not configured.”。

    解决方案:检查了第三步中的配置文件之后,发现在core-site.xml中错将fs.defaultFS打成了fs.defautFS。改正之后重新格式化,即可正常开启。

实验小结

在该部分实验中,我顺利地在装有Ubuntu14.04操作系统的VMware虚拟机上运行了Hadoop,并成功做到了在HDFS中执行任意操作、将本地磁盘文件复制到HDFS中。

通过该部分实验,我掌握了Hadoop的安装以及配置方法,明白了如何在HDFS上进行简单的操作。并且,在解决实验中遇到的一些问题的过程中,我对Hadoop有了更新的认识。

总的来说,该部分实验为接下来的两个实验打下了坚实的基础。

二、在Mapreduce中运行Word Count

实验目的、要求及环境

  • 实验目的:实现一个简单的MapReduce示例程序——Word Count,以此来掌握MapReduce的基本用法,为接下来的实验奠定基础

  • 实验要求:在Hadoop框架中使用MapReduce得到Word Count的正确结果

  • 实验环境

    • VMware Workstation 16 Pro

    • 装有Ubuntu14.04操作系统、并安装好Hadoop的VMware虚拟机

实验步骤

第一步:安装 Maven并更换仓库源

利用以下指令安装 Maven,

sudo apt-get install maven

然后通过修改settings.xml来替换为阿里云的maven镜像。

第二步:创建项目

执行以下指令以创建maven项目,

mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=cn.edu.seu.huyu -DartifactId=wordcount -DpackageName=cn.edu.seu.huyu -Dversion=1.0-SNAPSHOT -DinteractiveMode=false
第三步:修改pom.xml并下载依赖和插件

执行以下指令,并按照教程修改pom.xml,以添加hadoop等所需的依赖以及插件,

sudo vim ~/.m2/wordcount/pom.xml

修改完成并确认没有错误后,执行以下指令下载添加hadoop等所需的依赖以及插件,

mvn clean install -DskipTests
第四步:为WordCount编写Java代码

src/main/java/cn/edu/seu/huyu目录下创建WordCount.java,并编写代码。

第五步:将项目打包为jar包

执行以下指令将项目打包为jar包,

mvn clean install -DskipTests

因为使用vim进行代码编写,而非IDE,因此反复检查代码无误后,再执行以上指令,结果如图2.1所示。

生成WordCount项目的jar包

图 2.1 生成WordCount项目的jar包

第六步:提交并运行jar包

使用以下指令提交并在Hadoop中运行jar包,

bin/hadoop jar ~/.m2/wordcount/target/wordcount-1.0-SNAPSHOT.jar cn.edu.seu.huyu.WordCount /miao/miao.txt /miao/output
第七步:查看并检查实验结果

从第六步的指令中,我们可以看出结果应保存到了DFS中的目录/miao/output下,因此使用下面两条指令在DFS中查看结果,

bin/hdfs dfs -ls /miao/output
bin/hdfs dfs -cat /miao/output/part-r-00000

图2.2显示了本次实验的结果,从中选取若干个词在原文中统计后发现——实验结果是无误的

实验二结果

图 2.2 实验二结果

问题及解决方案

  1. 问题:创建maven项目时提报错:“The goal you specified requires a project to execute but there is no POM in this directory (/home/huvu/.m2). Please verify yor invoked Maven from the correct directory”。

    解决方案:检查后发现错将创建maven项目的指令错打成了... - Dversion=1.0-SNAPSHOT ... ,改正后即可正常创建项目。

实验小结

在该部分的实验中,我成功地实现了一个简单的MapReduce示例程序——Word Count。通过编写Mapper、Combiner、Reducer三部分的Java代码,然后将maven项目打包成jar包,并提交到Hadoop中运行,最后得到了第一步中创建的miao.txt的单词统计。经过人工比较,可以确定该实验结果是准确无误的。

另外,在该部分实验中涉及到了大量的手动输入工作,而我不够细心,经常导致输入错误,造成了许多错误(例如问题1)。虽然在解决错误的时候,通过在网上查阅资料、文档,我对Hadoop有了更深的理解,但也导致实验所耗费的时间过长。在接下来的实验、或者自己的科研任务中,我会更加仔细,防止出现这么低级的错误。

总之,通过动手实践Word Count这个简单的示例程序,我加深了课堂中所讲授的关于Hadoop的知识,也让我更加有信心能够完成接下来的实验。

三、在Mapreduce中运行KMeans

实验目的及要求

  • 实验目的:在Mapreduce框架中运行KMeans

  • 实验要求:使用MapReduce得到较好的实验结果

  • 实验环境

    • VMware Workstation 16 Pro

    • 装有Ubuntu14.04操作系统、并安装好Hadoop的VMware虚拟机

实验步骤

第一步:准备KMeans程序、数据集并初始化

将给定的KMeans.zip复制到VMware虚拟机中并解压,然后使用以下指令执行ProcessCorpus程序,以进行实验数据路径选择、将文本转换为bag-of-words向量、确定最大迭代次数、确定特征向量纬度等操作,

java -jar ProcessCorpus.jar

随后,使用以下指令执行GetCentroids程序,以进行初始化聚类中心操作,

java -jar GetCentroids.jar
第二步:将vectors和clusters两个文件拷贝到HDFS中

执行以下指令以创建相应路径,并将vectors和clusters两个文件拷贝到HDFS中,

../hadoop-2.10.1/bin/hdfs dfs -mkdir /data
../hadoop-2.10.1/bin/hdfs dfs -mkdir /clusters
../hadoop-2.10.1/bin/hdfs dfs -copyFromLocal vectors /data 
../hadoop-2.10.1/bin/hdfs dfs -copyFromLocal clusters /clusters
第三步:编译MapRedKMeans.jar

首先,执行以下指令以创建maven项目,

mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=cn.edu.seu.huyu -DartifactId=kmeans -DpackageName=cn.edu.seu.huyu -Dversion=1.0-SNAPSHOT -DinteractiveMode=false

然后,按照实验二中第三步一样修改pom.xml并下载依赖、插件,并c将/MapRedKMeans目录下的所有文件复制到kmeans项目目录下

cp ../MapRedKMeans/* src/main/java/cn/edu/seu/XXX/
第四步:在Mapreduce中执行KMeans

执行以下指令在Mapreduce中执行KMeans,

hadoop-2.10.1/bin/hadoop jar target/kmeasn-1.0-SNAPSHOT.jar KMeans /data /clusters 10
第五步:查看并检查实验结果

首先将HDFS上的实验结果拷贝回本地磁盘,再运行GetDistribution程序检查实验结果,如下所示:

  • Clusters1

******* cluster0 ******* comp.graphics: 5; comp.windows.x: 3; sci.electronics: 1; rec.sport.hockey: 1; alt.atheism: 1; comp.os.ms-windows.misc: 1;

******* cluster1 *******

******* cluster10 ******* comp.sys.ibm.pc.hardware: 20; rec.motorcycles: 20; talk.religion.misc: 17; sci.med: 17; sci.crypt: 16; rec.autos: 15; sci.electronics: 15; rec.sport.baseball: 14; alt.atheism: 14; misc.forsale: 13; talk.politics.misc: 12; talk.politics.guns: 12; comp.windows.x: 10; comp.os.ms-windows.misc: 10; sci.space: 8; comp.graphics: 7; talk.politics.mideast: 3; comp.sys.mac.hardware: 3; rec.sport.hockey: 3;

******* cluster11 ******* misc.forsale: 12; comp.sys.ibm.pc.hardware: 2; rec.autos: 1; rec.motorcycles: 1; sci.electronics: 1; comp.windows.x: 1; sci.space: 1; rec.sport.baseball: 1; sci.med: 1; comp.os.ms-windows.misc: 1;

******* cluster12 ******* sci.crypt: 74; soc.religion.christian: 64; talk.politics.guns: 62; talk.politics.misc: 55; alt.atheism: 53; rec.motorcycles: 50; comp.windows.x: 49; sci.space: 48; talk.religion.misc: 44; sci.med: 43; talk.politics.mideast: 42; rec.autos: 41; sci.electronics: 38; rec.sport.hockey: 37; rec.sport.baseball: 34; comp.sys.mac.hardware: 33; comp.sys.ibm.pc.hardware: 28; comp.os.ms-windows.misc: 28; comp.graphics: 25; misc.forsale: 7;

******* cluster13 *******

******* cluster14 ******* rec.sport.hockey: 27; talk.religion.misc: 20; rec.motorcycles: 19; comp.os.ms-windows.misc: 18; comp.graphics: 17; comp.sys.ibm.pc.hardware: 16; sci.electronics: 16; sci.med: 15; comp.sys.mac.hardware: 15; sci.space: 14; comp.windows.x: 13; rec.autos: 11; talk.politics.misc: 8; rec.sport.baseball: 7; alt.atheism: 7; talk.politics.mideast: 6; sci.crypt: 6; misc.forsale: 3; talk.politics.guns: 1;

******* cluster15 ******* misc.forsale: 12; comp.sys.ibm.pc.hardware: 7; comp.sys.mac.hardware: 5; comp.windows.x: 4; comp.graphics: 4; sci.electronics: 4; rec.autos: 3; rec.sport.hockey: 3; comp.os.ms-windows.misc: 2; rec.motorcycles: 2; alt.atheism: 2; talk.religion.misc: 1; sci.space: 1;

******* cluster16 ******* misc.forsale: 19; soc.religion.christian: 17; rec.sport.baseball: 12; comp.graphics: 9; sci.electronics: 9; talk.religion.misc: 8; comp.windows.x: 7; comp.os.ms-windows.misc: 7; comp.sys.mac.hardware: 7; sci.space: 6; talk.politics.misc: 5; alt.atheism: 5; rec.sport.hockey: 4; talk.politics.mideast: 3; comp.sys.ibm.pc.hardware: 3; rec.motorcycles: 3; rec.autos: 2; sci.med: 2; sci.crypt: 2; talk.politics.guns: 1;

******* cluster17 ******* comp.graphics: 12; misc.forsale: 11; sci.med: 9; sci.space: 8; sci.electronics: 7; comp.sys.ibm.pc.hardware: 6; rec.autos: 5; comp.sys.mac.hardware: 5; alt.atheism: 5; rec.sport.baseball: 4; comp.os.ms-windows.misc: 4; rec.motorcycles: 4; rec.sport.hockey: 4; talk.religion.misc: 3; talk.politics.mideast: 3; talk.politics.guns: 2; sci.crypt: 1;

******* cluster18 ******* comp.sys.mac.hardware: 42; rec.sport.hockey: 41; rec.sport.baseball: 37; talk.politics.misc: 36; talk.politics.guns: 35; talk.religion.misc: 34; sci.med: 33; rec.autos: 32; sci.space: 32; soc.religion.christian: 30; comp.graphics: 28; alt.atheism: 27; talk.politics.mideast: 25; sci.electronics: 23; rec.motorcycles: 22; comp.sys.ibm.pc.hardware: 21; comp.os.ms-windows.misc: 21; sci.crypt: 21; comp.windows.x: 9; misc.forsale: 9;

******* cluster19 ******* rec.motorcycles: 2; comp.sys.mac.hardware: 1; sci.electronics: 1; comp.os.ms-windows.misc: 1;

******* cluster2 ******* talk.politics.guns: 5; sci.crypt: 3; rec.sport.baseball: 1; sci.med: 1;

******* cluster3 ******* comp.sys.ibm.pc.hardware: 5; misc.forsale: 5; talk.politics.mideast: 3; sci.crypt: 3; rec.motorcycles: 3; sci.electronics: 3; comp.graphics: 2; sci.med: 2; comp.os.ms-windows.misc: 2; comp.sys.mac.hardware: 2; rec.autos: 1; comp.windows.x: 1; talk.politics.misc: 1; rec.sport.hockey: 1; alt.atheism: 1; talk.politics.guns: 1;

******* cluster4 ******* misc.forsale: 37; rec.sport.baseball: 29; comp.sys.ibm.pc.hardware: 28; alt.atheism: 28; sci.electronics: 26; comp.windows.x: 25; comp.sys.mac.hardware: 25; comp.graphics: 24; rec.autos: 23; talk.politics.misc: 22; talk.politics.guns: 22; sci.space: 21; talk.politics.mideast: 18; sci.med: 18; rec.motorcycles: 18; rec.sport.hockey: 18; talk.religion.misc: 17; sci.crypt: 13; soc.religion.christian: 12; comp.os.ms-windows.misc: 9;

******* cluster5 ******* misc.forsale: 17; rec.autos: 11; comp.windows.x: 8; rec.sport.baseball: 8; comp.sys.ibm.pc.hardware: 8; comp.graphics: 6; comp.sys.mac.hardware: 6; rec.sport.hockey: 5; sci.space: 5; sci.electronics: 4; talk.religion.misc: 3; comp.os.ms-windows.misc: 3; sci.crypt: 3; talk.politics.misc: 3; alt.atheism: 3; talk.politics.guns: 3; talk.politics.mideast: 2; sci.med: 2; rec.motorcycles: 2;

******* cluster6 *******

******* cluster7 ******* talk.politics.mideast: 44; soc.religion.christian: 27; sci.med: 7; talk.politics.misc: 7; rec.sport.hockey: 6; sci.space: 6; talk.politics.guns: 6; rec.autos: 5; alt.atheism: 4; talk.religion.misc: 3; rec.sport.baseball: 3; rec.motorcycles: 3; sci.crypt: 2; misc.forsale: 2; comp.sys.mac.hardware: 1; sci.electronics: 1;

******* cluster8 ******* comp.os.ms-windows.misc: 43; comp.windows.x: 20; comp.graphics: 11; sci.crypt: 6; comp.sys.ibm.pc.hardware: 6; comp.sys.mac.hardware: 5; misc.forsale: 2; talk.politics.misc: 1; rec.motorcycles: 1; sci.electronics: 1; talk.politics.mideast: 1;

******* cluster9 ******* misc.forsale: 1;

  • Clusters2

******* cluster0 ******* comp.graphics: 5; comp.windows.x: 3; sci.electronics: 1; rec.sport.hockey: 1; comp.os.ms-windows.misc: 1;

******* cluster1 ******* soc.religion.christian: 39; talk.politics.guns: 31; sci.crypt: 30; talk.politics.misc: 27; talk.religion.misc: 22; alt.atheism: 22; sci.space: 21; talk.politics.mideast: 20; rec.motorcycles: 20; comp.windows.x: 18; rec.autos: 17; sci.med: 17; rec.sport.hockey: 16; sci.electronics: 15; comp.os.ms-windows.misc: 11; comp.sys.mac.hardware: 10; rec.sport.baseball: 9; comp.sys.ibm.pc.hardware: 9; comp.graphics: 4; misc.forsale: 1;

******* cluster10 ******* rec.motorcycles: 25; comp.sys.ibm.pc.hardware: 24; talk.religion.misc: 20; rec.autos: 18; sci.med: 18; sci.crypt: 18; talk.politics.misc: 17; talk.politics.guns: 17; sci.electronics: 16; alt.atheism: 16; rec.sport.baseball: 15; misc.forsale: 13; comp.os.ms-windows.misc: 12; comp.windows.x: 11; comp.graphics: 9; sci.space: 9; comp.sys.mac.hardware: 4; rec.sport.hockey: 4; talk.politics.mideast: 3;

******* cluster11 ******* misc.forsale: 13; comp.sys.ibm.pc.hardware: 2; rec.autos: 1; rec.motorcycles: 1; sci.electronics: 1; comp.windows.x: 1; sci.space: 1; rec.sport.baseball: 1; sci.med: 1; comp.os.ms-windows.misc: 1;

******* cluster12 ******* sci.crypt: 41; talk.politics.guns: 32; alt.atheism: 31; talk.politics.misc: 28; rec.motorcycles: 28; comp.windows.x: 26; soc.religion.christian: 25; sci.med: 25; talk.politics.mideast: 24; rec.autos: 23; rec.sport.baseball: 23; sci.space: 21; comp.sys.mac.hardware: 20; talk.religion.misc: 19; sci.electronics: 19; comp.sys.ibm.pc.hardware: 18; comp.graphics: 15; comp.os.ms-windows.misc: 15; rec.sport.hockey: 14; misc.forsale: 4;

******* cluster13 *******

******* cluster14 ******* rec.sport.hockey: 31; comp.graphics: 20; comp.os.ms-windows.misc: 20; sci.electronics: 19; rec.motorcycles: 18; comp.windows.x: 17; sci.space: 17; comp.sys.mac.hardware: 16; talk.religion.misc: 15; comp.sys.ibm.pc.hardware: 15; sci.med: 15; rec.autos: 11; talk.politics.mideast: 7; talk.politics.misc: 7; rec.sport.baseball: 6; sci.crypt: 6; alt.atheism: 6; misc.forsale: 3; talk.politics.guns: 1;

******* cluster15 ******* misc.forsale: 14; comp.graphics: 6; comp.sys.ibm.pc.hardware: 5; comp.sys.mac.hardware: 5; comp.windows.x: 4; rec.motorcycles: 4; rec.sport.hockey: 4; sci.electronics: 3; rec.autos: 2; talk.religion.misc: 2; comp.os.ms-windows.misc: 2; sci.space: 2; alt.atheism: 2; soc.religion.christian: 1; rec.sport.baseball: 1;

******* cluster16 ******* misc.forsale: 21; soc.religion.christian: 12; rec.sport.baseball: 11; sci.electronics: 8; comp.graphics: 7; comp.sys.mac.hardware: 7; talk.religion.misc: 6; comp.windows.x: 6; comp.os.ms-windows.misc: 5; talk.politics.misc: 5; alt.atheism: 5; sci.space: 4; rec.motorcycles: 3; rec.sport.hockey: 3; rec.autos: 2; talk.politics.mideast: 2; comp.sys.ibm.pc.hardware: 2; sci.med: 2; sci.crypt: 2;

******* cluster17 ******* comp.graphics: 14; misc.forsale: 10; sci.space: 9; comp.sys.ibm.pc.hardware: 8; sci.med: 8; comp.sys.mac.hardware: 8; sci.electronics: 7; rec.sport.baseball: 6; rec.motorcycles: 6; rec.sport.hockey: 5; alt.atheism: 5; rec.autos: 4; talk.religion.misc: 4; comp.os.ms-windows.misc: 4; talk.politics.mideast: 3; talk.politics.misc: 3; talk.politics.guns: 3; soc.religion.christian: 1; comp.windows.x: 1; sci.crypt: 1;

******* cluster18 ******* rec.sport.hockey: 40; rec.sport.baseball: 36; talk.politics.guns: 35; comp.sys.mac.hardware: 34; soc.religion.christian: 33; talk.religion.misc: 33; rec.autos: 31; sci.med: 28; sci.space: 28; talk.politics.misc: 27; talk.politics.mideast: 25; alt.atheism: 24; rec.motorcycles: 21; sci.electronics: 21; sci.crypt: 17; comp.graphics: 16; comp.os.ms-windows.misc: 13; comp.sys.ibm.pc.hardware: 12; comp.windows.x: 9; misc.forsale: 6;

******* cluster19 ******* comp.sys.mac.hardware: 1; sci.electronics: 1; comp.os.ms-windows.misc: 1;

******* cluster2 ******* talk.politics.guns: 4; sci.crypt: 3; rec.sport.baseball: 1; sci.med: 1;

******* cluster3 ******* comp.sys.ibm.pc.hardware: 5; talk.politics.mideast: 3; sci.crypt: 3; sci.electronics: 3; rec.autos: 2; comp.graphics: 2; sci.med: 2; comp.sys.mac.hardware: 2; misc.forsale: 2; rec.motorcycles: 2; comp.windows.x: 1; comp.os.ms-windows.misc: 1; talk.politics.misc: 1; rec.sport.hockey: 1; alt.atheism: 1; talk.politics.guns: 1;

******* cluster4 ******* misc.forsale: 40; comp.sys.ibm.pc.hardware: 34; alt.atheism: 31; comp.graphics: 29; comp.windows.x: 27; rec.sport.baseball: 27; sci.electronics: 26; comp.sys.mac.hardware: 25; sci.med: 23; sci.space: 23; rec.autos: 22; talk.politics.misc: 21; talk.politics.mideast: 20; talk.religion.misc: 18; comp.os.ms-windows.misc: 16; rec.motorcycles: 16; sci.crypt: 15; talk.politics.guns: 15; rec.sport.hockey: 12; soc.religion.christian: 11;

******* cluster5 ******* misc.forsale: 20; comp.sys.mac.hardware: 11; rec.autos: 10; comp.sys.ibm.pc.hardware: 9; comp.windows.x: 8; rec.sport.baseball: 8; comp.graphics: 8; rec.sport.hockey: 7; sci.space: 7; sci.electronics: 6; comp.os.ms-windows.misc: 5; talk.religion.misc: 4; talk.politics.misc: 4; talk.politics.guns: 4; talk.politics.mideast: 3; sci.med: 3; sci.crypt: 3; alt.atheism: 3; rec.motorcycles: 2;

******* cluster6 *******

******* cluster7 ******* talk.politics.mideast: 39; soc.religion.christian: 28; rec.sport.hockey: 12; talk.politics.misc: 9; sci.space: 8; rec.autos: 7; talk.religion.misc: 7; sci.med: 7; talk.politics.guns: 7; rec.sport.baseball: 6; sci.crypt: 5; comp.graphics: 4; rec.motorcycles: 4; alt.atheism: 4; sci.electronics: 3; comp.sys.mac.hardware: 2; misc.forsale: 2; comp.windows.x: 1; comp.sys.ibm.pc.hardware: 1; comp.os.ms-windows.misc: 1;

******* cluster8 ******* comp.os.ms-windows.misc: 42; comp.windows.x: 17; comp.graphics: 11; sci.crypt: 6; comp.sys.ibm.pc.hardware: 6; comp.sys.mac.hardware: 5; talk.politics.misc: 1; misc.forsale: 1; sci.electronics: 1; talk.politics.mideast: 1;

******* cluster9 *******

  • Clusters3

******* cluster0 ******* comp.windows.x: 3; comp.graphics: 3; sci.electronics: 1; rec.sport.hockey: 1; comp.os.ms-windows.misc: 1;

******* cluster1 ******* soc.religion.christian: 38; sci.crypt: 38; talk.politics.guns: 28; talk.politics.misc: 26; talk.religion.misc: 25; talk.politics.mideast: 23; sci.space: 23; alt.atheism: 19; rec.sport.hockey: 18; sci.med: 17; rec.motorcycles: 17; sci.electronics: 17; comp.windows.x: 16; rec.autos: 15; comp.sys.mac.hardware: 13; rec.sport.baseball: 12; comp.sys.ibm.pc.hardware: 10; comp.os.ms-windows.misc: 10; comp.graphics: 3; misc.forsale: 1;

******* cluster10 ******* comp.sys.ibm.pc.hardware: 27; rec.motorcycles: 26; talk.religion.misc: 21; rec.autos: 20; sci.crypt: 20; sci.med: 19; talk.politics.misc: 19; rec.sport.baseball: 17; alt.atheism: 17; talk.politics.guns: 17; sci.electronics: 15; misc.forsale: 13; comp.os.ms-windows.misc: 12; comp.windows.x: 11; sci.space: 10; comp.graphics: 9; talk.politics.mideast: 6; comp.sys.mac.hardware: 4; rec.sport.hockey: 4;

******* cluster11 ******* misc.forsale: 13; comp.sys.ibm.pc.hardware: 2; rec.autos: 1; rec.motorcycles: 1; sci.electronics: 1; comp.windows.x: 1; sci.space: 1; rec.sport.baseball: 1; sci.med: 1; comp.os.ms-windows.misc: 1;

******* cluster12 ******* sci.crypt: 37; alt.atheism: 37; talk.politics.guns: 37; talk.politics.misc: 34; comp.windows.x: 31; rec.motorcycles: 31; rec.autos: 30; sci.med: 30; soc.religion.christian: 28; talk.religion.misc: 28; talk.politics.mideast: 28; rec.sport.baseball: 22; sci.space: 21; comp.graphics: 20; sci.electronics: 20; comp.os.ms-windows.misc: 18; comp.sys.mac.hardware: 18; comp.sys.ibm.pc.hardware: 17; rec.sport.hockey: 13; misc.forsale: 5;

******* cluster13 ******* rec.sport.hockey: 22; rec.sport.baseball: 21; soc.religion.christian: 17; comp.sys.mac.hardware: 14; talk.politics.misc: 13; sci.space: 13; talk.politics.mideast: 12; talk.politics.guns: 12; sci.med: 11; rec.autos: 8; talk.religion.misc: 8; alt.atheism: 8; comp.graphics: 7; sci.electronics: 7; comp.sys.ibm.pc.hardware: 6; comp.os.ms-windows.misc: 6; rec.motorcycles: 6; comp.windows.x: 4; sci.crypt: 4; misc.forsale: 1;

******* cluster14 ******* rec.sport.hockey: 32; comp.graphics: 22; sci.electronics: 21; comp.os.ms-windows.misc: 18; rec.motorcycles: 18; sci.space: 18; comp.windows.x: 17; comp.sys.ibm.pc.hardware: 16; comp.sys.mac.hardware: 16; sci.med: 13; rec.autos: 8; talk.religion.misc: 8; sci.crypt: 6; rec.sport.baseball: 5; misc.forsale: 5; talk.politics.mideast: 4; talk.politics.misc: 4; alt.atheism: 4; talk.politics.guns: 1;

******* cluster15 ******* misc.forsale: 14; comp.graphics: 6; comp.sys.ibm.pc.hardware: 6; comp.sys.mac.hardware: 5; rec.motorcycles: 5; comp.windows.x: 4; rec.sport.hockey: 3; sci.space: 3; rec.autos: 2; talk.religion.misc: 2; rec.sport.baseball: 2; comp.os.ms-windows.misc: 2; sci.electronics: 2; alt.atheism: 2; soc.religion.christian: 1; talk.politics.mideast: 1;

******* cluster16 ******* misc.forsale: 22; soc.religion.christian: 11; rec.sport.baseball: 10; sci.electronics: 8; talk.religion.misc: 6; comp.graphics: 6; comp.windows.x: 5; comp.sys.mac.hardware: 5; comp.os.ms-windows.misc: 4; rec.sport.hockey: 4; alt.atheism: 4; sci.med: 3; talk.politics.misc: 3; rec.motorcycles: 3; sci.space: 2; rec.autos: 1; talk.politics.mideast: 1; comp.sys.ibm.pc.hardware: 1; sci.crypt: 1;

******* cluster17 ******* comp.graphics: 17; misc.forsale: 12; comp.sys.mac.hardware: 11; sci.space: 10; comp.sys.ibm.pc.hardware: 8; sci.med: 8; sci.electronics: 8; rec.autos: 6; rec.sport.baseball: 6; rec.motorcycles: 6; talk.religion.misc: 5; talk.politics.mideast: 5; comp.os.ms-windows.misc: 5; talk.politics.misc: 5; alt.atheism: 5; talk.politics.guns: 5; rec.sport.hockey: 4; comp.windows.x: 3; soc.religion.christian: 1; sci.crypt: 1;

******* cluster18 ******* rec.autos: 21; talk.politics.guns: 21; talk.religion.misc: 20; rec.sport.hockey: 19; talk.politics.misc: 17; soc.religion.christian: 16; rec.sport.baseball: 16; comp.sys.mac.hardware: 15; sci.med: 14; talk.politics.mideast: 12; rec.motorcycles: 12; sci.electronics: 12; sci.space: 12; sci.crypt: 11; alt.atheism: 10; comp.sys.ibm.pc.hardware: 5; comp.os.ms-windows.misc: 5; misc.forsale: 4; comp.graphics: 3; comp.windows.x: 2;

******* cluster19 *******

******* cluster2 ******* talk.politics.guns: 4; rec.sport.baseball: 1; sci.med: 1;

******* cluster3 ******* comp.sys.ibm.pc.hardware: 5; talk.politics.mideast: 3; sci.crypt: 3; sci.electronics: 3; rec.autos: 2; comp.graphics: 2; comp.sys.mac.hardware: 2; misc.forsale: 2; rec.motorcycles: 2; comp.windows.x: 1; sci.med: 1; comp.os.ms-windows.misc: 1; talk.politics.misc: 1; rec.sport.hockey: 1; alt.atheism: 1; talk.politics.guns: 1;

******* cluster4 ******* misc.forsale: 37; alt.atheism: 35; comp.graphics: 30; comp.sys.ibm.pc.hardware: 29; comp.sys.mac.hardware: 27; comp.windows.x: 26; sci.electronics: 26; rec.sport.baseball: 24; sci.med: 22; sci.space: 21; talk.religion.misc: 20; comp.os.ms-windows.misc: 19; rec.autos: 18; talk.politics.misc: 17; rec.motorcycles: 16; sci.crypt: 15; talk.politics.mideast: 14; talk.politics.guns: 13; soc.religion.christian: 12; rec.sport.hockey: 9;

******* cluster5 ******* misc.forsale: 18; comp.sys.mac.hardware: 13; rec.autos: 11; comp.sys.ibm.pc.hardware: 11; comp.windows.x: 9; rec.sport.hockey: 8; rec.sport.baseball: 7; comp.graphics: 7; sci.space: 7; sci.electronics: 6; comp.os.ms-windows.misc: 5; talk.religion.misc: 4; sci.med: 4; sci.crypt: 4; alt.atheism: 4; talk.politics.guns: 4; talk.politics.mideast: 3; talk.politics.misc: 3; rec.motorcycles: 2;

******* cluster6 *******

******* cluster7 ******* talk.politics.mideast: 37; soc.religion.christian: 26; rec.sport.hockey: 12; sci.space: 9; rec.autos: 7; talk.politics.misc: 7; talk.politics.guns: 7; rec.sport.baseball: 6; sci.med: 6; rec.motorcycles: 5; sci.crypt: 4; alt.atheism: 4; talk.religion.misc: 3; comp.graphics: 3; comp.sys.mac.hardware: 2; misc.forsale: 2; sci.electronics: 2; comp.windows.x: 1; comp.sys.ibm.pc.hardware: 1; comp.os.ms-windows.misc: 1;

******* cluster8 ******* comp.os.ms-windows.misc: 42; comp.windows.x: 16; comp.graphics: 12; sci.crypt: 6; comp.sys.ibm.pc.hardware: 6; comp.sys.mac.hardware: 5; talk.politics.misc: 1; misc.forsale: 1; sci.electronics: 1; talk.politics.mideast: 1;

******* cluster9 *******

实验小结

在该部分的实验中,我成功地实现了在MapReduce上运行KMeans,并得到预期的结果。

通过该部分实验,更进一步掌握了MapReduce的使用方法,并且对Hadoop的框架、原理有了更清晰的认识。

总结

通过本次Hadoop实验,

  1. 首先,我对如何安装、搭建Hadoop分布式系统框架有了,并通过动手在Hadoop的分布式文件系统下进行操作,我体会到了HDFS的高容错性、高吞吐量等特点,也明白了其为什么能够被称之为Hadoop框架的两个核心设计之一。
  2. 其次,通过动手完成Hadoop的一个简单示例——Word Count,我对Hadoop所设计的Map、Reduce、Combine这三个流程有了更深入的了解,也掌握了如何为这三个部分编写自己的代码,更明白了为什么MapReduce计算框架能够被称之为Hadoop框架的两个核心设计之一,其真的为我们提供了非常方便的接口。
  3. 最后,根据教程的指示,我在Hadoop上实现了深度学习中经常用到的KMeans算法。因为我自己的研究方向就是深度学习,所以通过这部分的实验,我看到了Hadoop在我所在领域的无限可能,也感谢老师精心设计的实验教程为我打开了新的科研思路。

总之,这次实验中每一部分都环环相扣,一步一步地指引我深入探索Hadoop的基础架构、基本原理,教会我如何使用Hadoop。我受益匪浅!

Logo

更多推荐