1 安装地址

1.1 Hive官网地址

http://hive.apache.org/

1.2 文档查看地址

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

1.3 下载地址

http://archive.apache.org/dist/hive/

1.4 github地址

https://github.com/apache/hive

2 安装部署

2.1 Hive安装及配置

1.把apache-hive-1.2.2-bin.tar.gz上传到linux的/opt/software目录下

2.解压apache-hive-2.3.6-bin.tar.gz到/opt/module/目录下面

[caimh@master-node software]$ ll
总用量 585688
-rw-rw-r--. 1 caimh caimh  90859180 9月  26 2019 apache-hive-1.2.2-bin.tar.gz
-rw-r--r--. 1 caimh caimh 198865940 9月   1 06:20 hadoop-2.7.4-with-centos-6.5.tar.gz
-rw-r--r--. 1 caimh caimh      8009 9月  10 11:35 HDFSClientDemo-1.0-SNAPSHOT.jar
-rw-r--r--. 1 caimh caimh 194990602 5月  28 18:07 jdk-8u211-linux-x64.tar.gz
-rw-rw-r--. 1 caimh caimh  77807942 3月   3 2017 mysql-libs.zip
-rw-r--r--. 1 caimh caimh  37191810 6月   7 17:16 zookeeper-3.4.13.tar.gz
[caimh@master-node software]$ tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /opt/module/

3.修改apache-hive-1.2.2-bin的名称为hive-1.2.2

[caimh@master-node module]$ mv apache-hive-1.2.2-bin/ hive-1.2.2

4.修改/opt/module/hive/conf目录下的hive-env.sh.template名称为hive-env.sh

[caimh@master-node conf]$ mv hive-env.sh.template hive-env.sh

5.配置hive-env.sh文件

       (a)配置HADOOP_HOME路径

export HADOOP_HOME=/opt/module/hadoop-2.7.4

       (b)配置HIVE_CONF_DIR路径

export HIVE_CONF_DIR=/opt/module/hive-1.2.2/conf

2.2 Hadoop集群配置

1.必须启动hdfs和yarn

[caimh@master-node hadoop-2.7.4]$ sbin/start-dfs.sh 
[caimh@master-node hadoop-2.7.4]$ sbin/start-yarn.sh

2.在HDFS上创建/tmp和/user/hive/warehouse两个目录并修改他们的同组权限可写

[caimh@master-node hadoop-2.7.4]$ bin/hadoop fs -mkdir /tmp
[caimh@master-node hadoop-2.7.4]$ bin/hadoop fs -mkdir -p /user/hive/warehouse
[caimh@master-node hadoop-2.7.4]$ bin/hadoop fs -chmod g+w /tmp
[caimh@master-node hadoop-2.7.4]$ bin/hadoop fs -chmod g+w /user/hive/warehouse

或者在配置文件中关闭权限检查  在hadoop 的hdfs-site.xml 中

<property>

  <name>dfs.permissions.enable</name>

  <value>false</value>

</property>

3 Hive基本操作

[caimh@master-node hive-1.2.2]$ bin/hive        --1.启动hive
hive> show dahive> show databases;              --2.查看数据库  
OK
default
Time taken: 2.566 seconds, Fetched: 1 row(s)tabases;
hive> use default;                              --3.打开默认数据库
hive> show tables;                              --4.显示默认数据库default中的表
hive> create table Student(id int,name string); --5.创建一张表
hive> show tables;                              --6.显示数据库中有几张表
OK
student
Time taken: 0.04 seconds, Fetched: 1 row(s)    
hive> show tables;                              --7.查看表结构
OK
student
Time taken: 0.04 seconds, Fetched: 1 row(s)
hive> desc student;
OK
id                      int                                         
name                    string                                      
Time taken: 0.436 seconds, Fetched: 2 row(s) 
hive> insert into student(id,name) values(1,"caimh");    --8.向表中插入数据(会生成mr程序执行)
Query ID = caimh_20190925122936_1860f0bc-b2b9-4d2c-b1c8-1cb0bde16cc1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1569383663698_0001, Tracking URL = http://master-node:8088/proxy/application_1569383663698_0001/
Kill Command = /opt/module/hadoop-2.7.4/bin/hadoop job  -kill job_1569383663698_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-25 12:30:08,181 Stage-1 map = 0%,  reduce = 0%
2019-09-25 12:30:22,782 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 8.1 sec
MapReduce Total cumulative CPU time: 8 seconds 100 msec
Ended Job = job_1569383663698_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://master-node:9000/user/hive/warehouse/student/.hive-staging_hive_2019-09-25_12-29-36_074_6546745352170556491-1/-ext-10000
Loading data to table default.student
Table default.student stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 8.1 sec   HDFS Read: 3572 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 100 msec
OK
Time taken: 49.656 seconds
hive> select * from student;                        --9.查询表中数据
OK
1       caimh
Time taken: 0.241 seconds, Fetched: 1 row(s)      
hive> quit;                                         --10.退出hive
[caimh@master-node hive-1.2.2]$        

4 问题说明

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases(Hive-on-MR在Hive 2中已弃用,在以后的版本中可能不可用。 考虑使用其他执行引擎(例如spark,tez)或使用Hive 1.X版本)

由于本案例Hive是运行在MR上,所以Hive版本只能考虑1.X。不然,会报上面错误。所以本案例Hive安装以apache-hive-1.2.2-bin.tar.gz示范

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐