今天按照之前《Hadoop2.6.0 + zookeeper集群环境搭建 》一文重新搭建了Hadoop2.7.2+zookeeper的HA,实现namenode挂掉后可以自动切换,总体来说还算比较顺利。搭建完成后一切正常!但是!第二天重新启动集群的时候出现问题:两个namenode有一个始终启动不了!,具体问题描述如下:


HA按照规划配置好,启动后,NameNode不能正常启动。刚启动的时候 jps 看到了NameNode,但是隔了一两分钟,再看NameNode就不见了。查看日志发现以下报错信息:

  org.apache.hadoop.ipc.Client:Retrying connect to server



  • 先启动JournalNode,再启动HdfsNameNode可以启动并可以正常运行
  • 使用start-dfs.sh启动,众多服务都启动了,隔两分钟NameNode会退出,再次hadoop-daemon.sh start namenode单独启动可以成功稳定运行NameNode



2016-09-0300:58:46,256 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop2/ Already tried 0 time(s);retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000MILLISECONDS)















      Indicates the number of retries a clientwill make to establisha server connection.








      Indicates the number of milliseconds aclient will wait for before retrying to establish a server connection.




