Zookeeper ensemble not coming up

2019-02-18 05:11发布

问题:

I am trying to configure the ensemble of 3 nodes following the documentation. All of them are on Linux Ubuntu. on all the three nodes configuration file looks like this :

zoo.cfg under $ZOOKEEPER_HOME/conf

tickTime=2000
dataDir=/home/zkuser/zookeeper_data
clientPort=2181
initLimit=5
syncLimit=2
server.1=ip.of.zk1:2888:3888
server.2=ip.of.zk2:2888:3888
server.3=ip.of.zk3:2888:3888

I've also placed respective "myid" files under /home/zkuser/zookeeper_data/ directory. This myid files contain 1 which is on node (ip.of.zk1), so on and so forth.

When I start the zk server using bin/zkServer.sh start without showing any exception on the console. However when I open the zookeeper.out files under bin directory I see the following errors.

2014-11-04 00:23:49,120 [myid:3] - WARN  [WorkerSender[myid=3]:QuorumCnxManager@382] - Cannot open channel to 1 at election address /ip.of.zk1:3888
java.net.NoRouteToHostException: No route to host
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
    at java.net.Socket.connect(Socket.java:546)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:701)
2014-11-04 00:23:49,123 [myid:3] - WARN  [WorkerSender[myid=3]:QuorumCnxManager@382] - Cannot open channel to 2 at election address /ip.of.zk2:3888
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
    at java.net.Socket.connect(Socket.java:546)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:701)

Note :I've opened the appropriate ports using iptables on each machines. For example : Chain INPUT (policy ACCEPT)

target     prot opt source               destination         
ACCEPT     all  --  IP.of.ZK1       anywhere            
ACCEPT     all  --  IP.of.ZK2       anywhere            
ACCEPT     all  --  IP.of.ZK3       anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination    

Can anyone please tell me what I've been missing ?

Regards, JE

回答1:

Ensure that:

  • you have started the Zookeeper server on all 3 servers
  • all servers run in non-error mode by running echo ruok | netcat ip.of.zk2 2181. If ok, the server should respond with imok (FYI, here's a list of all supported 4-letter commands)
  • /home/zkuser/zookeeper_data/myid contains values 1/2/3 for each server, respectively
  • you can ping other 2 servers from the first server

If interested, I have created a vagrant+ansible script to create a 3-node virtual Zookeeper cluster, see https://github.com/mkrcah/virtual-zookeeper-cluster



回答2:

I had a similar issue. I have got some hints of what the problem may be from here and here. In my case, the output of the command netstat -plutn was showing something including 127.0.0.1:3888 for the election port 3888. I have solved the problem by changing part of zoo.cfg on server n, from something like

server.1=name.of.s1:2888.3888
...
server.n=localhost:2888:3888
...

to

server.1=name.of.s1:2888.3888
...
server.n=0.0.0.0:2888:3888
...

After a ZooKeeper restart, the output of netstat -plutn includes :::3888.

Apparently this is needed for ZooKeeper to properly expose the election port, in this case 3888.