dfs_hosts_allow in Cloudera Manager

2019-08-12 21:06发布

问题:

I am trying to setup HDFS & Cloudera Manager via the Cloudera Manager API. However I am stuck at a specific point:

I setup all the HDFS roles, but the NameNode refuses to communicate with the data nodes. The relevant error from the DataNode log:

Initialization failed for Block pool BP-1653676587-172.168.215.10-1435054001015 (Datanode Uuid null) service to master.adastragrp.com/172.168.215.10:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(172.168.215.11, datanodeUuid=1a114e5d-2243-442f-8603-8905b988bea7, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=cluster4;nsid=103396489;c=0)
    at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:917)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5085)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1140)
    at 

My DNS is configured via the hosts file, so I thought the following answer applies and tried the solution without success: https://stackoverflow.com/a/29598059/1319284

However, I have another small cluster with basically the same configuration as far as I can tell, which is working. DNS is configured through /etc/hosts as well, but here I set up the cluster via Cloudera Manager GUI instead of the API.

After that I finally found the configuration directory of the running NameNode process, and there I found a dfs_hosts_include file. Opening it reveals that only 127.0.0.1 is included. On the working cluster, all the nodes are included in that file. I find a similar weirdness in topology.map:

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<topology>
  <node name="master.adastragrp.com" rack="/default"/>
  <node name="127.0.0.1" rack="/default"/>
  <node name="slave.adastragrp.com" rack="/default"/>
  <node name="127.0.0.1" rack="/default"/>
</topology>

... That doesn't look right. Again, on the working cluster the IPs are as expected.

Not only do I not know what went wrong, I also do not know how to influence these files, as they are all auto-generated by Cloudera Manager. Has anyone seen this before and could provide guidance here?

回答1:

I finally found where I had the problem. The problem was in /etc/cloudera-scm-agent/config.ini

I generated this file with a template, and ended up with

listening_ip=127.0.0.1

which the cloudera-cm-agent happily reported to the server. For more information, see the question Salt changing /etc/hosts, but still caching old one?