可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have installed zookeeper in 3 different aws servers. The following is the configuration in all the servers
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=x.x.x.x:2888:3888
server.2=x.x.x.x:2888:3888
server.3=x.x.x.x:2888:3888
All the three instance have a myid
file at var/zookeeper
with appropriate id in it. All the three servers have all ports open from the aws console. But when I run the zookeeper server, I get the following error in all the instances.
2015-06-19 12:09:22,989 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
- Cannot open channel to 2 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
- Cannot open channel to 3 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 25600
回答1:
How have defined the ip of the local server in each node? If you have given the public ip, then the listener would have failed to connect to the port. You must specify 0.0.0.0 for the current node
server.1=0.0.0.0:2888:3888
server.2=192.168.10.10:2888:3888
server.3=192.168.2.1:2888:3888
This change must be performed at the other nodes too.
回答2:
I met the save question and solved it.
make sure the myid is the save with your configuration in the zoo.cfg.
please check your zoo.cfg file in your conf
directory, which contains such content.
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
and check the myid in your server dataDir directory. For example:
let's say the dataDir
defined on the zoo.cfg
is '/home/admin/data'
then on zookeeper1, you must have a file named myid and have value 1 on this file ;on zookeeper2, you must have a file named myid and have value 2 on this file; on zookeeper3, you must have a file named myid and have value 3 on this file.
if not configured like this, the server will listen on a wrong ip:port.
回答3:
Here is some ansible jinja2 template info for automating the build of a cluster with the 0.0.0.0 hostname in zoo.cfg
{% for url in zookeeper_hosts_list %}
{%- set url_host = url.split(':')[0] -%}
{%- if url_host == ansible_fqdn or url_host in ansible_all_ipv4_addresses -%}
server.{{loop.index0}}=0.0.0.0:2888:3888
{% else %}
server.{{loop.index0}}={{url_host}}:2888:3888
{% endif %}
{% endfor %}
回答4:
If your own hostname resolves to 127.0.0.1 (In my case, the hostname was in /etc/hosts), zookeeper won't start up without having 0.0.0.0 in the zoo.cfg file, but if your hostname resolves to the actual machine's IP, you can put it's own hostname in the config file.
回答5:
This is what worked for me
Step 1:
Node 1:
zoo.cfg
server.1= 0.0.0.0:<port>:<port2>
server.2= <IP>:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>
Node 2 :
server.1= <IP>:<port>:<port2>
server.2= 0.0.0.0:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>
Now in location defined by datadir on your zoo.cfg
Node 1:
echo 1 > <datadir>/id
Node 2:
echo 2 > <datadir>/id
.
.
.
Node n:
echo n > <datadir>/id
This one helped me to start zoo keeper successfully but will know more once i start playing with it. Hope this helps.
回答6:
Had similar issues on a 3-Node zookeeper ensemble.
Solution was as advised by espeirasbora and restarted.
So this was what I did
zookeeper1,zookeeper2 and zookeeper3
A. Issue :: znodes in my ensemble could not start
B. System SetUp ::
3 Znodes in three 3 machines
C. Error::
In my zookeper log file I could see the following errors
2016-06-26 14:10:17,484 [myid:1] - WARN [SyncThread:1:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:1 took 1340ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2016-06-26 14:10:17,847 [myid:1] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@810] - Connection broken for id 2, my id = 1, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2016-06-26 14:10:17,848 [myid:1] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@813] - Interrupting SendWorker
2016-06-26 14:10:17,849 [myid:1] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
2016-06-26 14:10:17,851 [myid:1] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@736] - Send worker leaving thread
2016-06-26 14:10:17,852 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846)
2016-06-26 14:10:17,854 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
D. Actions & Resolution ::
On each znode
a. I modified the configuration file $ZOOKEEPER_HOME/conf/zoo.cfg to set
the machines IP to "0.0.0.0" while maintaining the IP addressof the other 2 znodes.
b. restarted the znode
c. checked the status
d.Voila , I was ok
See below
-------------------------------------------------
on Zookeeper1
#Before modification
[zookeeper1]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper1]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=0.0.0.0:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
---------------------------------------------------------
on Zookeeper2
#Before modification
[zookeeper2]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper2]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=0.0.0.0:2888:3888
server.3=zookeeper3:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
---------------------------------------------------------
on Zookeeper3
#Before modification
[zookeeper3]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
#After modification
[zookeeper3]$ tail -3 $ZOOKEEPER_HOME/conf/zoo.cfg
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=0.0.0.0:2888:3888
#Start the Zookeper (Stop and STart or restart )
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
回答7:
In mycase, the issue was, I had to start all the three zookeeper servers, Only then I was able to connect to zookeeper server using ./zkCli.sh
回答8:
We faced the same issue , for our case the root cause of the problem is too-many number of client connections . The default ulimit on aws ec2 instance is 1024 and this causes zookeeper nodes not able to communicate with each other .
The fix for this is
change the ulimit to a higher number -> (> ulimit -n 20000 )
stop and start zookeeper.
回答9:
I had a similar issue. The status on 2 of my three zookeeper nodes was listed as "standalone", even though the zoo.cfg file indicated that it should be clustered. My third node couldn't start, with the error you described. I think what fixed it for me was running zkServer.sh start
in quick succession across my three nodes, such that zookeeper was running before the zoo.cfg initLimit was reached. Hope this works for someone out there.