ZooKeeper keeps getting EndOfStreamException, caus

2020-05-31 10:02发布

问题:

My Zookeeper is controlling a few different queues for different jobs, by holding the relevant job data in each node until the computer is ready to process. If I stop the overall service, such that no jobs can be started ZooKeeper runs just fine after a restart. However, some of these jobs seem to cause ZooKeeper to crash with the following message in the ZooKeeper log:

WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x15677f740ad002a, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:46998 which had sessionid 0x15677f740ad002a

My ZooKeeper knowledge is very limited, as I am taking over from the guy that set it up originally.

I have tried to delete a lot of nodes with rmr [path] in the zookeeper shell, which seemed to have some effect (deleted 50k+ nodes that was left over/of no use), but it has kept crashing daily, and last night I couldn't get it to run for more than a couple of minutes before the same error/crash would occur.

How do I find out what is causing this?

I am pretty sure it is some general problem with the data that is recieved, or the stored data/nodes. The disk is only 92% full. I also found this post: Zookeeper keeps getting the WARN: "caught end of stream exception", but the solution doesn't make much sense to me. Also I am pretty sure that none of the messages kept in my znodes are more than 1MB large, but I am unsure how to confirm this.

Is there some way I can change the ZooKeeper log so that I can print additional information, such as the content/name of the znode it is operating on before it crashes?

回答1:

I was able to solve the problem by deleting all zookeeper snapshots and log files from the server running ZooKeeper. I don't know why this made a difference, but it has been running fine for the last 22 hours.