SolrCloud is detecting non-existing nodes

2019-09-17 18:14发布

问题:

I am having an interesting situation with SolrCloud. Basically, I dont know why, but Solr instance, which does not in the cloud normally, is displayed on SolrCloud page and also visible in live_nodes path in Zookepeer.

Here are details about the situation:

I have one Solr instance, running as a standalone application on a virtual machine, located on a remove machine. We will call it virtual1 from now on.

This is the script for running it:

java
-server
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
-XX:+UseCompressedStrings
-Dcom.sun.management.jmxremote
-d64
-Xmx4096m
-Dcom.sun.management.jmxremote.port=9999
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Djava.rmi.server.hostname=remotehost -jar start.jar 

this instance is running on port 8983, so when you go to virtual1:8983, you see the classical admin page of solr. the rest of the configs are all the same with example solr, which comes with solr distrubutions.

Then, on my local machine (will be called local from now on), I am running my zookeeper servers on ports 2181 and 2182

then for adding my solr instances to the cloud, I am simply running one instance on my local machine and two more on virtual1 and the scripts for starting them as are below:

Solr Instance on my local:

java -Dbootstrap_conf=true -DzkHost=zkhost:2181 -Djetty.port=8984 -jar start.jar

Solr Instances on remote:

java -DzkHost=zkhost:2181 -Djetty.port=8985 -jar start.jar
java -DzkHost=zkhost:2182 -Djetty.port=8986 -jar start.jar

Until here, there are no exceptions or errors in either Solr or Zookeeper logs.

When I check virtual1:8985 and virtual1:8986 they are all running, as well as the instance on my local.

But when I check the cloud (both from Solr admin page and also zookeeper CLI). i can only see local:8983 and virtual1:8983 in the cloud, while virtual1:8985 and virtual1:8986 are not added at all... The weird point is that virtual1:8983 doesnt know anything about Zookeeper servers as you can see from the starting scripts above.

In addition the facts above, I have tried another thing. On another virtual machine(virtual2) which is running on the same hard machine with virtual1 , I have created Solr instances with:

java -DzkHost=zkhost:2181 -Djetty.port=8985 -jar start.jar
java -DzkHost=zkhost:2182 -Djetty.port=8986 -jar start.jar

So in this case I will have instances as virtual2:8985 and virtual2:8986, which should be in the cloud. But it doesnt happen... I can only see virtual2:8983, which does not exists in real. It simply shows the standalone Solr's port, which is running on virtual1.

can anyone explain why is this happening?

回答1:

You should try to remove zookeeper data and specify -Dbootstrap_confdir=/path/to/conf/dir which pushes configs and node state to zookeeper. Also if you're running zookeeper ensemble check their state (leader or follower, or if there is no connections to node) with

echo stat | nc zk_host zk_port

And you can use zkCli.sh to see cluster state with

get /clusterstate.json

And show zookeeper config, they should look something like this:

clientPort=2181 (2182 for second instance)
server.1=zk_host:2888:3888
server.2=zk_host:2888:3888

And check both Solr and Zookeeper logs for the successfull connections.