搭建zookeeper集群的过程这里就不说了,主要讲一下搭建后启动遇到的问题

这里有四台机子,准备拿三台做zookeeper集群,至于为什么只拿三台,那是因为zookeeper它更喜欢单数(具体原因请自行查找资料)

三台机子的ip和hostname为:

10.131.14.138 slave1

10.131.14.139 slave2

10.131.14.140 slave3

zookeeper配置文件:

zookeeper/conf/zoo.cfg

zoo.cfg最开始为zoo_sample.cfg,需要copy一份:

cp zoo_sample.cfg zoo.cfg

zoo.cfg内容为:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/usr/local/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.0=slave1:2888:3888
server.1=slave2:2888:3888
server.2=slave3:2888:3888

发现问题

启动后查看进程:

[root@slave1 data]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave1 data]# jps
14692 ResourceManager
20221 Jps
14399 NameNode

没有zookeeper的进程,查看日志文件zookeeper.out后显示:
 

ERROR [main:QuorumPeerMain@88] - Invalid config, exiting abnormally

....

Caused by: java.lang.IllegalArgumentException: /user/local/zookeeper/data/myid file is missing

解决办法:

在每个节点的/user/local/zookeeper/data/目录创建一个myid文件

启动后还是一样,没有进程,查看日志文件显示:

ERROR [main:QuorumPeerMain@88] - Invalid config, exiting abnormally
...

Caused by: java.lang.IllegalArgumentException: serverid null is not a number

解决办法:

向每个节点的myid文件任意添加一个数字:

slave1 添加1

slave2 添加2

slave3 添加3

结果slave1和slave2启动都成功了,slave3没有启动成功,查看日志

2019-03-08 18:09:35,596 [myid:] - INFO  [main:QuorumPeerConfig@136] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg
2019-03-08 18:09:35,609 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: slave2 to address: slave2/10.131.14.139
2019-03-08 18:09:35,609 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: slave1 to address: slave1/10.131.14.138
2019-03-08 18:09:35,610 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: slave3 to address: slave3/10.131.14.140
2019-03-08 18:09:35,610 [myid:] - INFO  [main:QuorumPeerConfig@398] - Defaulting to majority quorums
2019-03-08 18:09:35,612 [myid:3] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2019-03-08 18:09:35,612 [myid:3] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2019-03-08 18:09:35,613 [myid:3] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2019-03-08 18:09:35,620 [myid:3] - INFO  [main:QuorumPeerMain@130] - Starting quorum peer
2019-03-08 18:09:35,626 [myid:3] - INFO  [main:ServerCnxnFactory@117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-03-08 18:09:35,631 [myid:3] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-03-08 18:09:35,638 [myid:3] - INFO  [main:QuorumPeer@1158] - tickTime set to 2000
2019-03-08 18:09:35,638 [myid:3] - INFO  [main:QuorumPeer@1204] - initLimit set to 10
2019-03-08 18:09:35,638 [myid:3] - INFO  [main:QuorumPeer@1178] - minSessionTimeout set to -1
2019-03-08 18:09:35,639 [myid:3] - INFO  [main:QuorumPeer@1189] - maxSessionTimeout set to -1
2019-03-08 18:09:35,644 [myid:3] - ERROR [main:QuorumPeer@293] - Setting LearnerType to PARTICIPANT but 3 not in QuorumPeers.
2019-03-08 18:09:35,645 [myid:3] - INFO  [main:QuorumPeer@1467] - QuorumPeer communication is not secured!
2019-03-08 18:09:35,645 [myid:3] - INFO  [main:QuorumPeer@1496] - quorum.cnxn.threads.size set to 20
2019-03-08 18:09:35,647 [myid:3] - INFO  [main:QuorumPeer@668] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-03-08 18:09:35,677 [myid:3] - INFO  [main:QuorumPeer@683] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-03-08 18:09:35,703 [myid:3] - ERROR [main:QuorumPeerMain@92] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 3 not in the peer list
        at org.apache.zookeeper.server.quorum.QuorumPeer.startLeaderElection(QuorumPeer.java:718)
        at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:637)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)

提示 My id 3 not in the peer list

解决办法

myid 最后按照zoo.cfg中server.后面的那个数字添加,不然启动不会成功,当然你将slave3的myid改成0也能启动成功,但是这样感觉不好管理。

因此这里将slave1、slave2、slave3的myid改为了

slave1 添加0

slave2 添加1

slave3 添加2

启动之后使用jps查看进程:

slave1

[root@slave1 data]# jps
20236 QuorumPeerMain
20317 Jps

slave2

[root@slave2 data]# jps
20202 QuorumPeerMain
20256 Jps

slave3

[root@slave3 data]# jps
18302 QuorumPeerMain
18348 Jps

启动成功啦。

最后,再说一下关于集群里某些机子不能成功启动zookeeper的原因,可能是因为时间不同步,当你查阅了很多方法都不能解决的时候,不妨试一下将时间同步一下。center os7 可使用yum install ntp 安装时间同步工具。

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐