greenplum通过hdfs访问外部表
环境信息:os: CentOS Linux release 7.2.1511 (Core) greenplum version: 4.3.16搭建过程:1. 安装greenplum环境信息: dbid | content | role | preferred_role | mode | status | port | hostname | address | replicat
环境信息:
os: CentOS Linux release 7.2.1511 (Core)
greenplum version: 4.3.16
搭建过程:
1. 安装greenplum环境信息:
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
10 | 8 | p | p | s | u | 40000 | gp-s0008 | sdw8 | 41000 |
11 | 9 | p | p | s | u | 40001 | gp-s0008 | sdw8 | 41001 |
12 | 10 | p | p | s | u | 40002 | gp-s0008 | sdw8 | 41002 |
13 | 11 | p | p | s | u | 40003 | gp-s0008 | sdw8 | 41003 |
26 | 8 | m | m | s | u | 50000 | gp-s0009 | sdw9 | 51000 |
27 | 9 | m | m | s | u | 50001 | gp-s0009 | sdw9 | 51001 |
28 | 10 | m | m | s | u | 50002 | gp-s0009 | sdw9 | 51002 |
29 | 11 | m | m | s | u | 50003 | gp-s0009 | sdw9 | 51003 |
1 | -1 | p | p | s | u | 2345 | sdw7 | sdw7 | |
6 | 4 | p | p | s | u | 40000 | gp-s0007 | sdw7 | 41000 |
22 | 4 | m | m | s | u | 50000 | gp-s0008 | sdw8 | 51000 |
7 | 5 | p | p | s | u | 40001 | gp-s0007 | sdw7 | 41001 |
23 | 5 | m | m | s | u | 50001 | gp-s0008 | sdw8 | 51001 |
8 | 6 | p | p | s | u | 40002 | gp-s0007 | sdw7 | 41002 |
24 | 6 | m | m | s | u | 50002 | gp-s0008 | sdw8 | 51002 |
9 | 7 | p | p | s | u | 40003 | gp-s0007 | sdw7 | 41003 |
25 | 7 | m | m | s | u | 50003 | gp-s0008 | sdw8 | 51003 |
2 | 0 | p | p | c | u | 40000 | gp-s0010 | sdw10 | 41000 |
18 | 0 | m | m | s | d | 50000 | gp-s0007 | sdw7 | 51000 |
3 | 1 | p | p | c | u | 40001 | gp-s0010 | sdw10 | 41001 |
19 | 1 | m | m | s | d | 50001 | gp-s0007 | sdw7 | 51001 |
4 | 2 | p | p | c | u | 40002 | gp-s0010 | sdw10 | 41002 |
20 | 2 | m | m | s | d | 50002 | gp-s0007 | sdw7 | 51002 |
5 | 3 | p | p | c | u | 40003 | gp-s0010 | sdw10 | 41003 |
21 | 3 | m | m | s | d | 50003 | gp-s0007 | sdw7 | 51003 |
14 | 12 | p | p | c | u | 40000 | gp-s0009 | sdw9 | 41000 |
30 | 12 | m | m | s | d | 50000 | gp-s0010 | sdw10 | 51000 |
15 | 13 | p | p | c | u | 40001 | gp-s0009 | sdw9 | 41001 |
31 | 13 | m | m | s | d | 50001 | gp-s0010 | sdw10 | 51001 |
16 | 14 | p | p | c | u | 40002 | gp-s0009 | sdw9 | 41002 |
32 | 14 | m | m | s | d | 50002 | gp-s0010 | sdw10 | 51002 |
17 | 15 | p | p | c | u | 40003 | gp-s0009 | sdw9 | 41003 |
33 | 15 | m | m | s | d | 50003 | gp-s0010 | sdw10 | 51003 |
2.安装JDK,路径:/usr/java/jdk1.7.0_80
3.安装HADOOP环境,路径如下:
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11
4. 配置环境变量:
(1) 配置gp环境变量(所有gp节点上都需要执行):
.bash_profile
export JAVA_HOME=/usr/java/jdk1.7.0_80
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/client
source /apps/greenplum/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/export/gpdata/gpmaster/gpsegs-1
export PGPORT=2345
(2) 配置hadoop_env.sh环境变量(所有gp节点上都需要执行):
export JAVA_HOME=/usr/java/jdk1.7.0_80
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/client
source /apps/greenplum/greenplum_path.sh
(3) 配置hadoop版本信息及路径信息(只需要在master节点上执行):
gpconfig -c gp_hadoop_target_version -v "cdh5"
gpconfig -c gp_hadoop_home -v "'/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/client'"
(4) 执行如下命令,使参数生效
gpstop -u
5.为gpadmin在gp集群环境中赋权
#为HDFS protocol赋权限
grant insert on protocol gphdfs to gpadmin;
grant select on protocol gphdfs to gpadmin;
grant all on protocol gphdfs to gpadmin;
6. 重新启动greenplum集群信息:
gpstop -a
gpstart -a
7. 配置gpadmin用户能访问hdfs环境,使其对hdfs有访问权限
8.使用gpadmin用户进行测试
hdfs dfs -ls /user/gpadmin/*
hdfs dfs -ls /user/gpadmin/devinfo.txt
9.登录gp集群,并创建外部表
drop external table ext_devinfo;
CREATE EXTERNAL TABLE ext_devinfo(
devid varchar(50),
appid varchar(50)
)
LOCATION ('gphdfs://perf044:8020/user/gpadmin/devinfo.txt') format 'text' (delimiter '|');
所遇到的问题:
(1) 找不到JAVA类,TaskAttemptContext
使用gp访问hdfs时,发现缺少类 TaskAttemptContext
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop目录中,而是在下一级目录中,/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/client
解决:修改HADOOP_HOME为/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/client
(2) 外部表创建成功后,访问外部表时,一直提示如下信息:
DETAIL:
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
Exception in thread "main" java.lang.IllegalArgumentException: java.net
.UnknownHostException: user
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
Command: execute:source $GPHOME/lib//hadoop/hadoop_env.sh
;java $GP_JAVA_OPT -classpath $CLASSPATH com.emc.greenplum.gpdb.hdfsconnector.HDFSReader $GP_SEGMENT_ID $GP_SEGMENT_COUNT TEXT cdh4.1-gnet- 1.2.0.0
'gphdfs://user/gpadmin/devinfo.txt' '000000104300044000000104300044' 'devid,appid,'
External table ext_devinfo, file gphdfs://user/gpadmin/devinfo.txt
更多推荐
所有评论(0)