一、容器进程介绍

若你是通过k8s部署的高可用hbase集群(三台hbase-master,五台hbase-slave)

hbase-master容器进程:
master-1(nn1):
    1.HMaster
	2.DFSZKFailoverController
	3.NameNode
	4.JournalNode
	5.QuorumPeerMain
master-2(nn2):
    1.HMaster
	2.DFSZKFailoverController
	3.NameNode
	4.JournalNode
	5.QuorumPeerMain
master-3:
    1.JournalNode
	2.QuorumPeerMain
	
hbase-slave容器进程:
    1.HRegionServer
	2.DataNode

二、Hmaster

2.1 重启master进程

su hadoop
~/hbase-current/bin/hbase-daemon.sh stop master
~/hbase-current/bin/hbase-daemon.sh start master

2.2 命令行查看master状态

su hadoop
curl http://${master-ip}:${port}/master-status
curl http://127.0.0.1:60010/master-status

三、Namenode

3.1 重启NameNode

su - hadoop
hadoop-daemon.sh stop namenode
hadoop-daemon.sh start namenode

3.2 查看NameNode状态

su hadoop
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn1

3.3 检查SafeMode

su - hadoop
hdfs dfsadmin -safemode get

四、RegionServer

4.1 重启RegionServer

su - hadoop
~/hbase-current/bin/hbase-daemon.sh restart regionserver

五、DataNode

5.1 重启DataServer

su - hadoop
/home/hadoop/hadoop-current/sbin/hadoop-daemons.sh stop datanode
/home/hadoop/hadoop-current/sbin/hadoop-daemons.sh start datanode

六、Zookeeper

6.1 重启zk

su hadoop
~/zookeeper-current/bin/zkServer.sh restart

6.2 查看zk状态

su hadoop
~/zookeeper-current/bin/zkServer.sh status

七、ZKFC

7.1 启动zkfc

su hadoop
hadoop-daemon.sh start zkfc

7.2 停止zkfc

su hadoop
hadoop-daemon.sh stop zkfc

八、JournalNode

8.1 启动Journalnode

su hadoop
hadoop-daemon.sh start journalnode 

8.2 停止Journalnode

su hadoop
hadoop-daemon.sh stop journalnode 

九、HDFS

9.1 查看HDFS状态

su - hadoop
hdfs dfsadmin -report

9.2 查看HDFS空间利用率

su hadoop
hdfs dfs -du -h /

9.3 根据blockid查看文件

su hadoop
hdfs fsck -blockId ${blockId}

9.4 查看配置

cat  ~/hadoop-current/etc/hadoop/hdfs-site.xml

十、HIVE

10.1 手动删除hive表

ALTER TABLE  xxxx drop partition(dt < '20230213');

10.2 修复分区

msck repair table xxxx;

十一、hbase集群检查步骤

检查步骤:
1、状态验证
    kuebctl get pod -A |grep hbase|grep -v running
	
2、nn checkpoint检查
    查看master1和master2的nn checkpoint状态,检查当前最新的一个fsimage文件是什么时候
	ls -lt /home/hadoop/cluster-data/dfs/name/current |grep fsimage
	
    (如果最新的额image更新时间超过2小时,请联系@咸泽进行协助处理,此时主要操作逻辑是通过手工切换nn来触发checkpoint)

3、检查hdfs集群状态
    # 检查nn健康状态
	hdfs haadmin -checkHealth nn1  (master1 即为nn1)
    hdfs haadmin -checkHealth nn2  (master2 即为nn2)

    # 检查namenode节点状态,确保一个是active,一个是standby
	hdfs haadmin -getServicesState nn1
    hdfs haadmin -getServicesState nn1
	
	# 检查安全模式是否已经关闭
	hdfs dfsadmin -safemode get
	
	# 检查是否有坏块
	hdfs dfsadmin -report|grep 'Missing blocks'
	(输出结果应该如下:Missing blocks:0)

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐