Unable to start a node manager on master

2019-01-26 01:01发布

问题:

I am setting up a Hadoop YARN cluster and I am using a machine as both a master and a slave. When I start the YARN using the following command, it starts the nodemanager on slaves but not on the master node.

sbin/yarn-daemons.sh start nodemanager

I have a master which also is slave and then I have another two slaves within the cluster, the nodemanagers in the slaves are starting properly.

The error I get :

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException

Output of some of the Commands .

cat /etc/services | grep 8040
ampify          8040/tcp                # Ampify Messaging Protocol
ampify          8040/udp                # Ampify Messaging Protocol

lsof -i tcp:8040
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
java    28021   df  195u  IPv6 3580602      0t0  TCP server1.mydomain.com:ampify (LISTEN

回答1:

Under the default configuration that Hadoop ships, port 8040 is the port that the NodeManager uses for the localizer. This is basically a server endpoint responsible for bringing the files required to run a container onto the local node. (For example, this can be a MapReduce job's jar file or distributed cache files.)

Assuming that there is another server on the machine (here shown as Ampify) legitimately bound to port 8040, and you don't want to stop that service, then it is possible to reconfigure the port used by the NodeManager for the localizer. Set property yarn.nodemanager.localizer.address in your yarn-site.xml file. This is documented here:

http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Pulling that from the XML source in the Hadoop tree, here is the documentation for the property:

<property>
  <description>Address where the localizer IPC is.</description>
  <name>yarn.nodemanager.localizer.address</name>
  <value>${yarn.nodemanager.hostname}:8040</value>
</property>


回答2:

Above error means, you are trying to start a process on 8040, which is already occupied by another instance.

To get rid of this error, you need to kill the process which is currently listening to port 8040. Your lsof output says pid is 28021. kill the process using the following command and start again

kill -9 28021