工作中运行hbase报的zookeeper异常信息:
2013-06-28 18:26:59,946 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x33f7eb9d0650002-0x33f7eb9d0650002-0x33f7eb9d0650002 Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
2013-06-28 18:26:59,946 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x33f7eb9d0650002-0x33f7eb9d0650002-0x33f7eb9d0650002 Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
2013-06-28 18:26:59,947 ERROR org.apache.hadoop.hbase.master.ActiveMasterManager: master:60000-0x33f7eb9d0650002-0x33f7eb9d0650002-0x33f7eb9d0650002 Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)


以下异常原因以及解决方法为摘抄,待验证。

异常原因:

hbase中和GC相关的参数:

修改前(默认):

export HBASE_OPTS="$HBASE_OPTS -ea -verbose:gc -Xloggc:$HBASE_LOG_DIR/hbase.gc.log -XX:ErrorFile=$HBASE_LOG_DIR/hs_err_pid.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"

咨询开发修改后:

export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xloggc:$HBASE_LOG_DIR/hbase.gc.log -XX:ErrorFile=$HBASE_LOG_DIR/hs_err_pid.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70"

-XX UseConcMarkSweepGC :设置年老代为并发收集。(新老都有)

老:-XX:+CMSIncrementalMode :设置为增量模式。适用于单CPU情况。

新:-XX:+UseParNewGC:设置年轻代为并行收集。可与 CMS 收集同时使用。

-XX:CMSInitiatingOccupancyFraction=70:这个参数是我觉得产生最大作用的。因为最终的目的是减少FULL GC,因为full gc是会block其他线程的。

默认触发GC的时机是当年老代内存达到90%的时候,这个百分比由 -XX:CMSInitiatingOccupancyFraction=N 这个参数来设置。concurrent mode failed发生在这样一个场景:
当年老代内存达到90%的时候,CMS开始进行并发垃圾收集,于此同时,新生代还在迅速不断地晋升对象到年老代。当年老代CMS还未完成并发标记时,年老 代满了,悲剧就发生了。CMS因为没内存可用不得不暂停mark,并触发一次全jvm的stop the world(挂起所有线程),然后采用单线程拷贝方式清理所有垃圾对象,也就是full gc。而我们的bulk的最开始的操作就是各种删表,建表频繁的操作,就会使用掉大量master的年轻代的内存,就会发生上面发生的场景,发生full gc。

解决办法:CMSInitiatingOccupancyFraction=70表示年老代占到约70%时就开始执行CMS,这样就不会出现(或很少出现)Full GC了。
Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐