ZooKeeper Distributed模式

ZooKeeper分布式模式安装(ZooKeeper集群)也比较容易,这里说明一下基本要点。

首先要明确的是,ZooKeeper集群是一个独立的分布式协调服务集群,“独立”的含义就是说,如果想使用ZooKeeper实现分布式应用的协调与管理,简化协调与管理,任何分布式应用都可以使用,这就要归功于Zookeeper的数据模型(Data Model)和层次命名空间(Hierarchical Namespace)结构,详细可以参考http://zookeeper.apache.org/doc/trunk/zookeeperOver.html。在设计你的分布式应用协调服务时,首要的就是考虑如何组织层次命名空间。

下面说明分布式模式的安装配置,过程如下所示:

第一步:主机名称到IP地址映射配置

ZooKeeper集群中具有两个关键的角色:Leader和Follower。集群中所有的结点作为一个整体对分布式应用提供服务,集群中每个结点之间都互相连接,所以,在配置的ZooKeeper集群的时候,每一个结点的host到IP地址的映射都要配置上集群中其它结点的映射信息。

例如,我的ZooKeeper集群中每个结点的配置,以master为例,/etc/hosts内容如Hadoop所示。


ZooKeeper采用一种称为Leader election的选举算法。在整个集群运行过程中,只有一个Leader,其他的都是Follower,如果ZooKeeper集群在运行过程中Leader出了问题,系统会采用该算法重新选出一个Leader。因此,各个结点之间要能够保证互相连接,必须配置上述映射。

ZooKeeper集群启动的时候,会首先选出一个Leader,在Leader election过程中,某一个满足选举算的结点就能成为Leader。整个集群的架构可以参考http://zookeeper.apache.org/doc/trunk/zookeeperOver.html#sc_designGoals

第二步:修改ZooKeeper配置文件

在其中一台机器(master)上,解压缩zookeeper-3.4.6.tar.gz,修改配置文件conf/zoo.cfg,内容如下所示:

tickTime=2000
dataDir=/home/xuhui/hadoop-2.2.0/tmp/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=cloud001:2888:3888
server.2=cloud002:2888:3888

上述配置内容说明,可以参考 http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html#sc_RunningReplicatedZooKeeper

第三步:远程复制分发安装文件

上面已经在一台机器slave-01上配置完成ZooKeeper,现在可以将该配置好的安装文件远程拷贝到集群中的各个结点对应的目录下:

cd /home/xuhui/hadoop-2.2.0/
scp -r zookeeper-3.4.6/ xuhui@cloud002:/home/xuhui/hadoop-2.2.0/

第四步:设置myid

在我们配置的dataDir指定的目录下面,创建一个myid文件,里面内容为一个数字,用来标识当前主机,conf/zoo.cfg文件中配置的server.X中X为什么数字,则myid文件中就输入这个数字,例如:

xuhui@cloud001:~/hadoop-2.2.0/tmp$ mkdir zookeeper

xuhui@cloud001:~/hadoop-2.2.0$ echo "1" > /home/xuhui/hadoop-2.2.0/tmp/zookeeper/myid

xuhui@cloud002:~/hadoop-2.2.0/tmp$ mkdir zookeeper

xuhui@cloud002:~/hadoop-2.2.0$ echo "2" > /home/xuhui/hadoop-2.2.0/tmp/zookeeper/myid

第五步:启动ZooKeeper集群

在ZooKeeper集群的每个结点上,执行启动ZooKeeper服务的脚本,如下所示:

xuhui@cloud001:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkServer.sh start

xuhui@cloud002:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkServer.sh start

以结点master为例,日志如下所示:

xuhui@cloud001:~/hadoop-2.2.0/zookeeper-3.4.6$ tail -500f zookeeper.out 
2014-05-21 11:26:42,603 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf/zoo.cfg
2014-05-21 11:26:42,611 [myid:] - WARN  [main:QuorumPeerConfig@293] - No server failure will be tolerated. You need at least 3 servers.
2014-05-21 11:26:42,612 [myid:] - INFO  [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2014-05-21 11:26:42,626 [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2014-05-21 11:26:42,627 [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2014-05-21 11:26:42,627 [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2014-05-21 11:26:42,646 [myid:1] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
2014-05-21 11:26:42,695 [myid:1] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2014-05-21 11:26:42,744 [myid:1] - INFO  [main:QuorumPeer@959] - tickTime set to 2000
2014-05-21 11:26:42,744 [myid:1] - INFO  [main:QuorumPeer@979] - minSessionTimeout set to -1
2014-05-21 11:26:42,744 [myid:1] - INFO  [main:QuorumPeer@990] - maxSessionTimeout set to -1
2014-05-21 11:26:42,744 [myid:1] - INFO  [main:QuorumPeer@1005] - initLimit set to 5
2014-05-21 11:26:42,768 [myid:1] - INFO  [main:QuorumPeer@473] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2014-05-21 11:26:42,940 [myid:1] - INFO  [main:QuorumPeer@488] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2014-05-21 11:26:43,035 [myid:1] - INFO  [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: cloud001/172.24.241.56:3888
2014-05-21 11:26:43,050 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2014-05-21 11:26:43,054 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id =  1, proposed zxid=0x0
2014-05-21 11:26:43,057 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-05-21 11:26:43,085 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:744)
2014-05-21 11:26:43,263 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:43,265 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 400
2014-05-21 11:26:43,667 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:43,669 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 800
2014-05-21 11:26:44,471 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:44,473 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 1600
2014-05-21 11:26:46,075 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:46,076 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 3200
2014-05-21 11:26:49,278 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:49,280 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 6400
2014-05-21 11:26:55,682 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:55,684 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 12800
2014-05-21 11:27:08,539 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:27:08,541 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 25600
2014-05-21 11:27:34,143 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:27:34,145 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 51200
2014-05-21 11:28:25,347 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:28:25,349 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 60000
2014-05-21 11:28:30,573 [myid:1] - INFO  [cloud001/172.24.241.56:3888:QuorumCnxManager$Listener@511] - Received connection request /172.18.19.37:39108
2014-05-21 11:28:30,593 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-05-21 11:28:30,594 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-05-21 11:28:30,796 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@784] - FOLLOWING
2014-05-21 11:28:30,819 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@86] - TCP NoDelay set to: true
2014-05-21 11:28:30,830 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2014-05-21 11:28:30,830 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:host.name=cloud001
2014-05-21 11:28:30,830 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.version=1.7.0_45
2014-05-21 11:28:30,830 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.vendor=Oracle Corporation
2014-05-21 11:28:30,831 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.home=/usr/lib/jvm/jdk1.7.0_45/jre
2014-05-21 11:28:30,831 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.class.path=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/classes:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf:.:/usr/lib/jvm/jdk1.7.0_45/lib:/home/xuhui/hadoop-2.2.0/mahout-distribution-0.9/lib:/usr/lib/jvm/jdk1.7.0_45/jre/lib:.:/usr/lib/jvm/jdk1.7.0_45/lib:/lib:/usr/lib/jvm/jdk1.7.0_45/jre/lib:
2014-05-21 11:28:30,831 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
2014-05-21 11:28:30,831 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.io.tmpdir=/tmp
2014-05-21 11:28:30,831 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.compiler=<NA>
2014-05-21 11:28:30,836 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:os.name=Linux
2014-05-21 11:28:30,836 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:os.arch=i386
2014-05-21 11:28:30,836 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:os.version=3.8.0-29-generic
2014-05-21 11:28:30,837 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:user.name=xuhui
2014-05-21 11:28:30,837 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:user.home=/home/xuhui
2014-05-21 11:28:30,837 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:user.dir=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6
2014-05-21 11:28:30,839 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/xuhui/hadoop-2.2.0/tmp/zookeeper/version-2 snapdir /home/xuhui/hadoop-2.2.0/tmp/zookeeper/version-2
2014-05-21 11:28:30,840 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 107786
2014-05-21 11:28:31,367 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@323] - Getting a diff from the leader 0x0
2014-05-21 11:28:31,371 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting: 0x0 to /home/xuhui/hadoop-2.2.0/tmp/zookeeper/version-2/snapshot.0

我启动的顺序是slave-01>slave-02>slave-03,由于ZooKeeper集群启动的时候,每个结点都试图去连接集群中的其它结点,先启动的肯定连不上后面还没启动的,所以上面日志前面部分的异常是可以忽略的。通过后面部分可以看到,集群在选出一个Leader后,最后稳定了。

其他结点可能也出现类似问题,属于正常。

第六步:安装验证

可以通过ZooKeeper的脚本来查看启动状态,包括集群中各个结点的角色(或是Leader,或是Follower),如下所示,是在ZooKeeper集群中的每个结点上查询的结果:

xuhui@cloud002:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader

通过上面状态查询结果可见,cloud002是集群的Leader,其余的两个结点是Follower。

另外,可以通过客户端脚本,连接到ZooKeeper集群上。对于客户端来说,ZooKeeper是一个整体(ensemble),连接到ZooKeeper集群实际上感觉在独享整个集群的服务,所以,你可以在任何一个结点上建立到服务集群的连接,例如:


xuhui@cloud002:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkCli.sh -server cloud002:2181
Connecting to cloud002:2181
2014-05-21 11:38:55,520 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2014-05-21 11:38:55,523 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=cloud002
2014-05-21 11:38:55,524 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_45
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/jdk1.7.0_45/jre
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/classes:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf:.:/usr/lib/jvm/jdk1.7.0_45/lib:/usr/lib/jvm/jdk1.7.0_45/jre/lib:
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2014-05-21 11:38:55,526 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=i386
2014-05-21 11:38:55,527 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.8.0-29-generic
2014-05-21 11:38:55,527 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=xuhui
2014-05-21 11:38:55,527 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/xuhui
2014-05-21 11:38:55,527 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6
2014-05-21 11:38:55,528 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=cloud002:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@a61d64
Welcome to ZooKeeper!
2014-05-21 11:38:55,552 [myid:] - INFO  [main-SendThread(cloud002:2181):ClientCnxn$SendThread@975] - Opening socket connection to server cloud002/172.18.19.37:2181. Will not attempt to authenticate using SASL (unknown error)
2014-05-21 11:38:55,575 [myid:] - INFO  [main-SendThread(cloud002:2181):ClientCnxn$SendThread@852] - Socket connection established to cloud002/172.18.19.37:2181, initiating session
JLine support is enabled
[zk: cloud002:2181(CONNECTING) 0] 2014-05-21 11:38:55,744 [myid:] - INFO  [main-SendThread(cloud002:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server cloud002/172.18.19.37:2181, sessionid = 0x2461cd2455b0000, negotiated timeout = 30000


WATCHER::


WatchedEvent state:SyncConnected type:None path:null


[zk: cloud002:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: cloud002:2181(CONNECTED) 2] 


当前根路径为/zookeeper。


总结说明

主机名与IP地址映射配置问题

启动ZooKeeper集群时,如果ZooKeeper集群中slave-02结点的日志出现如下错误:


java.net.SocketTimeoutException
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2012-01-08 06:37:46,026 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@697] - Notification time out: 6400
2012-01-08 06:37:57,431 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address slave-02/202.106.199.35:3888
java.net.SocketTimeoutException
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2012-01-08 06:38:02,442 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address slave-03/202.106.199.35:3888

很显然,slave-01在启动时连接集群中其他结点(slave-02、slave-03)时,主机名映射的IP与我们实际配置的不一致,所以集群中各个结点之间无法建立链路,整个ZooKeeper集群启动是失败的。

上面错误日志中slave-02/202.106.199.35:3888实际应该是slave-02/202.192.168.0.178:3888就对了,但是在进行域名解析的时候映射有问题,修改每个结点的/etc/hosts文件,将ZooKeeper集群中所有结点主机名到IP地址的映射配置上。

参考链接

http://blog.csdn.net/shirdrn/article/details/7183503#

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐