大家好!砸门又又又又见面了。我再自我介绍一下哈,我长得比较帅,帅到哭的那种。
呵呵,开玩笑。这世界上,唯独美人和大数据不可辜负。好好学大数据技术,技多不压身。越学人就越帅,你说是吧?言归正传,今天要做的实验是Hive表属性操作。
表属性有啥?表名称、增加列,修改列呀!

修改表名alter table table_name rename to new_table_name;

修改列名alter table tablename change column c1 c2 int comment 'xxxxx' after severity;//可以把该列放到指定列的后面,或者使用‘first’放到第一位

增加列alter table tablename add columns(c1 string comment 'xxxx',c2 long comment 'yyyyyy')

1 砸门还是来实战一下,照着这个代码敲就可以练个七七八八:

hive> create table testchange(
    > name string,value string
    > );
OK
Time taken: 0.085 seconds
hive> alter table testchange rename to test;
OK
Time taken: 0.478 seconds
hive> 

来,还是要检查一下:

hive> desc test;
OK
name	string	
value	string	
Time taken: 0.381 seconds
hive> 

把test表增加两个列type,col

hive> alter table test add columns(type string,col int comment 'xielaoshi');
OK
Time taken: 0.13 seconds
hive> desc test;
OK
name	string	
value	string	
type	string	
col	int	xielaoshi
Time taken: 0.235 seconds
hive> 

还可以这样玩,把type列调到name后面:

hive> alter table test change column type type string after name; 
OK
Time taken: 0.161 seconds
hive> desc test;                                                  
OK
name	string	
type	string	
value	string	
col	int	xielaoshi
Time taken: 0.156 seconds
hive> 

还可以这样玩,把type放在第一列:

hive> alter table test change column type type string first;
OK
Time taken: 0.113 seconds
hive> desc test;                                            
OK
type	string	
name	string	
value	string	
col	int	xielaoshi
Time taken: 0.149 seconds
hive> desc formatted test;

好不好玩?好玩吧?其实学技术就是要自己把自己逗乐,就有趣了!
2 修改tblproperties

hive> alter table test set tblproperties('comment'='xxxxx');
OK
Time taken: 0.122 seconds
hive> desc formatted test;                                  
OK
# col_name            	data_type           	comment             
	 	 
type                	string              	None                
name                	string              	None                
value               	string              	None                
col                 	int                 	xielaoshi           
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Thu Jun 02 06:20:43 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop1:9000/user/hive/warehouse/test	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	comment             	xxxxx               
	last_modified_by    	root                
	last_modified_time  	1464874178          
	transient_lastDdlTime	1464874178          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	serialization.format	1                   
Time taken: 0.14 seconds
hive> 

其实这就改了表的描述信息。so easy!
3 修改serdeproperties 。serde是序列化和反序列化。究竟什么是序列化什么是发序列化?
额外补充:序列化就是指将结构化对象 (实例) 转化为字节流 (字符数组)。反序列化就是将字节流转向结构化对象的逆过程。 于是,如果想把“活的”对象存储到文件,存储这串字节即可,如果想把“活的”对象发送到远程主机,发送这串字节即可,需要对象的时候,做一下反序列化,就能将对象“复活”了。
将对象序列化存储到文件,术语又叫“持久化”。将对象序列化发送到远程计算机,术语又叫“数据通信”。
其实不难理解吧,让我想想有什么好的比喻来说明?其实就好比你打电话给你爸妈,要把声音转换成电信号(序列化),到了你爸妈那里又转换成声音(反序列化)。
因为Hadoop在集群之间进行通讯或者RPC调用的时候,需要序列化,而且要求序列化要快,且体积要小,占用带宽要小。所以必须理解Hadoop的序列化机制。
Hive底层是hadoop,序列化对hadoop很重要。有多重要?
Hadoop中各个节点的通信是通过远程调用(RPC)实现的:进程通信和永久存储。,所以序列化和反序列化在分布式数据处理领域经常出现然而,那么 RPC序列化要求具有以下特点:

    1. 紧凑:紧凑的格式能让我们能充分利用网络带宽,而带宽是数据中心最稀缺的资源;
  1. 快速:进程通信形成了分布式系统的骨架,所以需要尽量减少序列化和反序列化的性能开销,这是基本的;
  2. 可扩展:协议为了满足新的需求变化,所以控制客户端和服务器过程中,需要直接引进相应的协议,这些是新协议,原序列化方式能支持新的协议报文;
  3. 互操作:能支持不同语言写的客户端和服务端进行交互;

好,不啰嗦了。serdeproperties的修改针对有分区有无分区是不同的。
无分区:

alter table table_name set serdeproperties('field.delim'='\t');

有分区是啥情况呢?

alter table test partition(dt=' xxxxx') set serdeproperties('field.delim'='\t');

为什么要这么修改?其实道理很简单,有些数据就是格式不符合你的统计分析,删除又是不可能,所以只能修改。
呵呵,还是来点实际的比较好:

hive> create table city(
    > time string,
    > country string,
    > province string,
    > city string)
    > row format delimited fields terminated by '#' 
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 0.111 seconds
hive> 

加载点数据玩玩:

hive> load data local inpath '/usr/host/city' into table city;
Copying data from file:/usr/host/city
Copying file: file:/usr/host/city
Loading data to table default.city
OK
Time taken: 0.426 seconds
hive> select * from city;
OK
20130829234535	china	henan	nanyang	NULL	NULL	NULL
20130829234536	china	henan	xinyang	NULL	NULL	NULL
20130829234537	china	beijing	beijing	NULL	NULL	NULL
20130829234538	china	jiang	susuzhou	NULL	NULL	NULL
20130829234539	china	hubei	wuhan	NULL	NULL	NULL
20130829234540	china	sandong	weizhi	NULL	NULL	NULL
20130829234541	china	hebei	shijiazhuang	NULL	NULL	NULL
20130829234542	china	neimeng	eeduosi	NULL	NULL	NULL
20130829234543	china	beijing	beijing	NULL	NULL	NULL
20130829234544	china	jilin	jilin	NULL	NULL	NULL
Time taken: 0.105 seconds
hive>  

是不是不知道我city里面的数据是啥?可以给你看看哟!

[root@hadoop1 data3]# cd /usr/host/
[root@hadoop1 host]# cat city 
20130829234535	china	henan	nanyang
20130829234536	china	henan	xinyang
20130829234537	china	beijing	beijing
20130829234538	china	jiang	susuzhou
20130829234539	china	hubei	wuhan
20130829234540	china	sandong	weizhi
20130829234541	china	hebei	shijiazhuang
20130829234542	china	neimeng	eeduosi
20130829234543	china	beijing	beijing
20130829234544	china	jilin	jilin
[root@hadoop1 host]# 

神奇吧,但是问题来了。下面的数据整整齐齐,上面的数据怎么那么多null?看官,不着急,我先喝口水。
好,其实我下面的数据字段之间是用‘\t’制表符分隔的,我上面的字段是用‘#’分隔的。那么hive会找‘#’,没找到就把所有的数据当做第一个字段,后面三个字段用null补上。明白了吗?没明白,加我微信xiehuadong1,语音交流。
不信,你瞧瞧:

hive> desc formatted city;                                      
OK
# col_name            	data_type           	comment             
	 	 
time                	string              	None                
country             	string              	None                
province            	string              	None                
city                	string              	None                
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Thu Jun 02 06:43:57 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop1:9000/user/hive/warehouse/city	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1464875117          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	#                   
	line.delim          	\n                  
	serialization.format	#                   
Time taken: 0.132 seconds
hive> 

“field.delim # ” 看到了吧,那我们来修改一下:把#分隔符改为\t

hive> alter table city set serdeproperties('field.delim'='\t');
OK
Time taken: 0.118 seconds
hive> select * from city;
OK
20130829234535	china	henan	nanyang
20130829234536	china	henan	xinyang
20130829234537	china	beijing	beijing
20130829234538	china	jiang	susuzhou
20130829234539	china	hubei	wuhan
20130829234540	china	sandong	weizhi
20130829234541	china	hebei	shijiazhuang
20130829234542	china	neimeng	eeduosi
20130829234543	china	beijing	beijing
20130829234544	china	jilin	jilin
Time taken: 0.125 seconds
hive> 

接下来我们再创建一个分区表city1;

hive> create table city1(
    > time string,
    > country string,
    > province string,
    > city string) partitioned by (dt string)
    > row format delimited fields terminated by '#' 
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 0.046 seconds
hive> 

加载数据,查询,查看描述信息:

hive> load data local inpath '/usr/host/city' into table city1 partition(dt='20160519');
Copying data from file:/usr/host/city
Copying file: file:/usr/host/city
Loading data to table default.city1 partition (dt=20160519)
OK
Time taken: 0.189 seconds
hive> select * from city1;
OK
20130829234535	china	henan	nanyang	NULL	NULL	NULL	20160519
20130829234536	china	henan	xinyang	NULL	NULL	NULL	20160519
20130829234537	china	beijing	beijing	NULL	NULL	NULL	20160519
20130829234538	china	jiang	susuzhou	NULL	NULL	NULL	20160519
20130829234539	china	hubei	wuhan	NULL	NULL	NULL	20160519
20130829234540	china	sandong	weizhi	NULL	NULL	NULL	20160519
20130829234541	china	hebei	shijiazhuang	NULL	NULL	NULL	20160519
20130829234542	china	neimeng	eeduosi	NULL	NULL	NULL	20160519
20130829234543	china	beijing	beijing	NULL	NULL	NULL	20160519
20130829234544	china	jilin	jilin	NULL	NULL	NULL	20160519
Time taken: 0.083 seconds
hive> desc formatted city1;
OK
# col_name            	data_type           	comment             
	 	 
time                	string              	None                
country             	string              	None                
province            	string              	None                
city                	string              	None                
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
dt                  	string              	None                
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Thu Jun 02 06:55:56 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop1:9000/user/hive/warehouse/city1	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1464875756          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	#                   
	line.delim          	\n                  
	serialization.format	#                   
Time taken: 0.076 seconds
hive> 

把#分隔符改为\t

hive> alter table city1 set serdeproperties('field.delim'='\t');
OK
Time taken: 0.103 seconds
hive> 

另一种修改方式

hive> alter table city1 partition(dt='20160519') set serdeproperties('field.delim'='\t');
OK
Time taken: 0.141 seconds
hive> select * from city1;                                                               
OK
20130829234535	china	henan	nanyang	20160519
20130829234536	china	henan	xinyang	20160519
20130829234537	china	beijing	beijing	20160519
20130829234538	china	jiang	susuzhou	20160519
20130829234539	china	hubei	wuhan	20160519
20130829234540	china	sandong	weizhi	20160519
20130829234541	china	hebei	shijiazhuang	20160519
20130829234542	china	neimeng	eeduosi	20160519
20130829234543	china	beijing	beijing	20160519
20130829234544	china	jilin	jilin	20160519
Time taken: 0.086 seconds
hive> 

多玩玩:

hive> select * from city1 where dt='20160519';
OK
20130829234535	china	henan	nanyang	20160519
20130829234536	china	henan	xinyang	20160519
20130829234537	china	beijing	beijing	20160519
20130829234538	china	jiang	susuzhou	20160519
20130829234539	china	hubei	wuhan	20160519
20130829234540	china	sandong	weizhi	20160519
20130829234541	china	hebei	shijiazhuang	20160519
20130829234542	china	neimeng	eeduosi	20160519
20130829234543	china	beijing	beijing	20160519
20130829234544	china	jilin	jilin	20160519
Time taken: 0.579 seconds
hive> 

3 修改location
文章有点长,耐心看呀,最后一点点了。

alter table table_name[partition()] set location 'path'
alter table table_name set TBLPROPERTIES('EXTERNAL'='TRUE');//内部表转外部表
alter table table_name set TBLPROPERTIES('EXTERNAL'='FALSE');//外部表转内部表

例如:

[root@hadoop1 host]# hadoop fs -mkdir /location
[root@hadoop1 host]# hadoop fs -put /usr/host/city /location
[root@hadoop1 host]# hadoop fs -ls /location
Found 1 items
-rw-r--r--   1 root supergroup        359 2016-06-02 07:04 /location/city
[root@hadoop1 host]# 

接下来我要把city的location从hdfs://hadoop1:9000/user/hive/warehouse/city 变为hdfs://hadoop1:9000/location

hive> alter table city set location 'hdfs://hadoop1:9000/location';
OK
Time taken: 0.112 seconds

不信你看desc formatted city;

hive> select * from city;
OK
20130829234535	china	henan	nanyang
20130829234536	china	henan	xinyang
20130829234537	china	beijing	beijing
20130829234538	china	jiang	susuzhou
20130829234539	china	hubei	wuhan
20130829234540	china	sandong	weizhi
20130829234541	china	hebei	shijiazhuang
20130829234542	china	neimeng	eeduosi
20130829234543	china	beijing	beijing
20130829234544	china	jilin	jilin
Time taken: 0.112 seconds
hive> desc formatted city;                                         
OK
# col_name            	data_type           	comment             
	 	 
time                	string              	None                
country             	string              	None                
province            	string              	None                
city                	string              	None                
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Thu Jun 02 06:43:57 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop1:9000/location	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	last_modified_by    	root                
	last_modified_time  	1464876384          
	transient_lastDdlTime	1464876384          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	\t                  
	line.delim          	\n                  
	serialization.format	#                   
Time taken: 0.078 seconds
hive> 

上面的location变了吧。看看删除这个表时什么情况

hive> drop table city;
OK
Time taken: 0.122 seconds
hive> 

你把city 删了后,元数据的指定位置的数据就全部都没有了,因为city是一个MANAGED_TABLE 内部表。
外部表和内部表有啥区别呢?欢迎看官留言!

4 再重新创建一下city;

hive> create table city(
    > time string,
    > country string,
    > province string,
    > city string)
    > row format delimited fields terminated by '#' 
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 1.042 seconds
hive>

当然咯,这时一个内部表。我要把内部表改成外部表:把city表修改成外表

hive> 
    > alter table city set tblproperties('EXTERNAL'='TRUE');
OK
Time taken: 0.184 seconds
hive> desc formatted city;
OK
# col_name            	data_type           	comment             
	 	 
time                	string              	None                
country             	string              	None                
province            	string              	None                
city                	string              	None                
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Thu Jun 02 07:15:20 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop1:9000/user/hive/warehouse/city	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
	EXTERNAL            	TRUE                
	last_modified_by    	root                
	last_modified_time  	1464876990          
	transient_lastDdlTime	1464876990          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	#                   
	line.delim          	\n                  
	serialization.format	#                   
Time taken: 0.296 seconds
hive> 

那你说怎么把外部表改成内部表呀?
执行代码:

alter table city set tblproperties('EXTERNAL'='FALSE');
desc formatted city;

好了,今天就到这里,有点累了,休息一下。如果你看到此文,想进一步学习或者和我沟通,加我微信公众号:名字:谢华东
在这里插入图片描述

see you again!

Logo

助力广东及东莞地区开发者,代码托管、在线学习与竞赛、技术交流与分享、资源共享、职业发展,成为松山湖开发者首选的工作与学习平台

更多推荐