CDH管理界面可直接配置进行自动化的迁移(可通过备份-复制计划来定制hive的迁移);本文介绍使用hadoop distcp命令批量迁移hive数据。

hadoop distcp迁移分为三个步骤: 1、批量导表 2、hive数据文件的迁移 3、分区恢复

==============批量获取表结构在目标集群建表================

##get_hive_db_tables.sh
##源集群上执行
#!/bin/bash

##批量迁移表之前需在目标集群中建立对应的database
##脚本位置
where_src_table_info="/home/hadoop/hive_db_tables"

##查询大数据所有的database
hive -e " show databases; exit ;" |  grep -v default | grep -v test  > ${where_src_table_info}/databases.txt


if [ ! -d "${where_src_table_info}/tables" ] ; then
      mkdir "${where_src_table_info}/tables"
fi

if [ ! -d "${where_src_table_info}/desc_table" ] ; then
      mkdir "${where_src_table_info}/desc_table"
fi

for database in `cat ${where_src_table_info}/databases.txt`
do
  {
  echo "database:${database}"
  hive -e " use $database ;  show tables ; exit ;"   > ${where_src_table_info}/tables/$database
    ##对生成文件处理多余行
  sed -i '1d'  ${where_src_table_info}/tables/$database
  sed -i "/WARN:/d"  ${where_src_table_info}/tables/$database
 
  
  
  for table in `cat ${where_src_table_info}/tables/$database  ` 
  do
	if [ ! -d "${where_src_table_info}/desc_table/$database" ] ; then
      mkdir "${where_src_table_info}/desc_table/$database"
	fi
     echo "table:${database}.${table}"
	 ##获取表结构,备份表结构
     hive -e "use $database ; show create table $table ;" >> ${where_src_table_info}/desc_table/$database/${table}.sql
     echo >> ${where_src_table_info}/desc_table/$database/${table}.sql
	 ##清理表结构中多余的东西
	 sed -i "/WARN:/d" `grep -rl WARN: ${where_src_table_info}/desc_table/$database/${table}.sql`
	 sed -i "/createtab_stmt/d" `grep -rl createtab_stmt ${where_src_table_info}/desc_table/$database/${table}.sql`
	 sed -i  "s/\`//g"     ${where_src_table_info}/desc_table/$database/${table}.sql
	 echo ";" >> ${where_src_table_info}/desc_table/$database/${table}.sql
	 
	 ##传输建表
	 v_create_sql=`cat ${where_src_table_info}/desc_table/$database/${table}.sql`
	 echo $v_create_sql
	 ssh  username@xx.xx.xx.xx " hive -e  \"use $database;  $v_create_sql \" ; impala-shell -q \"invalidate metadata ${database}.${table} \" "
  done

  }
done

==============集群之间数据文件移动================

#!/bin/bash
##目标集群上执行
where_src_table_info="/home/hadoop/hive_db_tables"

v_database='db_name1
db_name2'  


for   database in  $v_database  
do


ret=`ssh user_name@xx.xx.xx.xx  " cat ${where_src_table_info}/tables/$database "`
echo "开始移动数据"${database}-${ret}
	for tem in $ret;
	do
		
		echo "开始移动数据"${database}-${tem}
		starttime=`date +%s`
		##distcp要求目标地址和源集群地址必须使用namenode,参数的使用可自动查询选择
		 hadoop distcp    -bandwidth 300  -m 50 hdfs://xx.xx.xx.xx:8020/user/hive/warehouse/${database}.db/${tem}/*   hdfs://xx.xx.xx.xx:8020/user/hive/warehouse/${database}.db/${tem} 
		 endtime=`date +%s`
		echo "结束distcp,本次移动"${database}"."$tem"花费时间"$[$(($endtime - $starttime))/60]"分钟"
    done
done

===============添加分区==============

#!/bin/bash
##目标集群上执行
##需要添加分区的表名称列表
v_table_names='databasename.tab_name'
for   table_name in  $v_table_names
do
##添加分区的开始时间
v_start_data_str=`hdfs  dfs -ls /user/hive/warehouse/${table_name%.*}.db/${table_name##*.}/ |  grep date_timekey=| head -n 1`
lvv_start_timekey=${v_start_data_str##*date_timekey=}

##添加分区的结束时间
v_end_data_str=`hdfs  dfs -ls /user/hive/warehouse/${table_name%.*}.db/${table_name##*.}/ |  grep date_timekey=| tail -n 1`
lvv_end_timekey=${v_end_data_str##*date_timekey=}

lvn_start_sec=`date -d "$lvv_start_timekey" "+%s"`
lvn_end_sec=`date -d "$lvv_end_timekey" "+%s"`

echo "==============================开始${table_name}分区恢复,恢复开始时间${lvv_start_timekey}=========================================="
for((i=$lvn_start_sec;i<$lvn_end_sec;i+=86400))
do
     lvv_data_start=`date -d "@$i" "+%Y%m%d"`
     echo $lvv_data_start
     j=$[ i + 86400 ]
     lvv_data_end=`date -d "@$j" "+%Y%m%d"`
     echo $lvv_data_end
	impala-shell -q "ALTER TABLE ${table_name} ADD IF  not EXISTS PARTITION (date_timekey='$lvv_data_start');"
	
done
echo "==============================结束${table_name}分区恢复=========================================="
done

迁移的过程中遇见问题

        1、目标集群用root用户执行 hadoop distcp迁移的时候没有有报错,但是日志显示任务提交后一直没有反应,在 mapreduce.Job: Running job: 之后日志就卡着不动,map任务就是不run:

21/08/22 13:12:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1629427129196_0056
21/08/22 13:12:45 INFO mapreduce.JobSubmitter: Executing with tokens: []
21/08/22 13:12:45 INFO conf.Configuration: resource-types.xml not found
21/08/22 13:12:45 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/08/22 13:12:45 INFO impl.YarnClientImpl: Submitted application application_1629427129196_0056
21/08/22 13:12:45 INFO mapreduce.Job: The url to track the job: http://t4hadoopap01:8088/proxy/application_1629427129196_0056/
21/08/22 13:12:45 INFO tools.DistCp: DistCp job-id: job_1629427129196_0056
21/08/22 13:12:45 INFO mapreduce.Job: Running job: job_1629427129196_0056

由于用的测试集群,性能比较差,查看cdh界面的yarn任务有很多卡在那里,导致任务无法执行。清除卡死的任务。

        2、ERROR tools.DistCp: Exception encountered 报错

ERROR tools.DistCp: Exception encountered
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hive/.staging/_distcp-683351060/fileList.seq could only be written to 0 of the 1 minReplication nodes. There are 4 datanode(s) running and 4 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2102)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2673)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

3、ERROR tools.DistCp: Exception encountered File /user/hive/.staging/_distcp-683351060/fileList.seq could only be written to 0

4、ERROR tools.DistCp: Invalid arguments

ERROR tools.DistCp: Invalid arguments:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1962)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1421)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3055)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1151)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:940)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
        at org.apache.hadoop.ipc.Client.call(Client.java:1445)
        at org.apache.hadoop.ipc.Client.call(Client.java:1355)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:875)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1630)
        at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1496)
        at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1617)
        at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:241)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:432)
Invalid arguments: Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1962)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1421)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3055)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1151)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:940)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

大概意思参数不对,提示在待机状态下不支持读取操作类别。

使用相同的 hadoop distcp 命令,第一次成功迁移,第二次执行失败。
最后查证由于修改了目标集群的参数,重启了HDFS,导致活动namenode的ip发生变更,之前namenode的ip成为备用节点,处于待机状态。
修改目标主机的namemode的ip即可。

更多推荐