Is there a way to add nodes to a running Hadoop cl

I have been playing with Cloudera and I define the number of clusters before I start my job then use the cloudera manager to make sure everything is running.

I’m working on a new project that instead of using hadoop is using message queues to distribute the work but the results of the work are stored in HBase. I might launch 10 servers to process the job and store to Hbase but I’m wondering if I later decided to add a few more worker nodes can I easily (read: programmable) make them automatically connect to the running cluster so they can locally add to clusters HBase/HDFS?

Is this possible and what would I need to learn in order to do it?

标签： hadoop cluster-computing hbase hdfs Cloudera

4条回答

Fickle 薄情

2楼-- · 2019-03-08 21:48

It can be done without restarting the hadoop cluster. As per this document, You can add nodes in the includes file and and making some changes in the hdfs-site.xml and mapred-site.xml files, you can achieve this. Detailed instructions are given in the this document

0人赞添加讨论(0) 举报

看我几分像从前

3楼-- · 2019-03-08 21:49

Here is the documentation for adding a node to Hadoop and for HBase. Looking at the documentation, there is no need to restart the cluster. A node can be added dynamically.

0人赞添加讨论(0) 举报

在下西门庆

4楼-- · 2019-03-08 21:51

If I understand you correctly you have workers you coordinate by yourself that connect to HBase to save their data. you can have as many of those as you need and they can connect to Hbase as they're added (as long as they can see the zookeeper quorum).

If you are talking about increasing the Hadoop cluster. Since you already use Cloudera you can do that via the cloudera Manager REST API or the the Java client someone implemented for it

0人赞添加讨论(0) 举报

男人必须洒脱

5楼-- · 2019-03-08 21:54

Following steps should help you launch the new node into the running cluster.

1> Update the /etc/hadoop/conf/slaves list with the new node-name
2> Sync the full configuration /etc/hadoop/conf to the new datanode from the Namenode. If the file system isn't shared.  
2>  Restart all the hadoop services on Namenode/Tasktracker and all the services on the new Datanode. 
3>  Verify the new datanode from the browser http://namenode:50070
4>  Run the balancer script to readjust the data between the nodes.

If you don't want to restart the services on the NN, when you add a new node. I would say add the names ahead to the slaves configuration file. So they report as decommission/dead nodes until they are available. Following the above DataNode only steps. Again this not the best practice.

0人赞添加讨论(0) 举报

Is there a way to add nodes to a running Hadoop cl

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间