how does hdfs choose a datanode to store

As the title indicates, when a client requests to write a file to the hdfs, how does the HDFS or name node choose which datanode to store the file? Does the hdfs try to store all the blocks of this file in the same node or some node in the same rack if it is too big? Does the hdfs provide any APIs for applications to store the file in a certain datanode as he likes?

标签： hadoop hdfs

5条回答

家丑人穷心不美

2楼-- · 2019-01-23 08:31

The code for choosing datanode is in function ReplicationTargetChooser.chooseTarget().

The comment says that :

The replica placement strategy is that if the writer is on a datanode, the 1st replica is placed on the local machine, otherwise a random datanode. The 2nd replica is placed on a datanode that is on a different rack. The 3rd replica is placed on a datanode which is on the same rack as the first replica.

It doesn`t provide any API for applications to store the file in the datanode they want.

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-01-23 08:32

Now with Hadoop-385 patch, we can choose the block placement policy, so as to place all blocks of a file in the same node (and similarly for replicated nodes). Read this blog about this topic - look at the comments section.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2019-01-23 08:35

If someone prefers charts, here is a picture (source):
enter image description here

0人赞添加讨论(0) 举报

虎瘦雄心在

5楼-- · 2019-01-23 08:37

how does the HDFS or name node choose which datanode to store the file?

HDFS has a BlockPlacementPolicyDefault, check the API documentation for more details. It should be possible to extend BlockPlacementPolicy for a custom behavior.

Does the hdfs provide any APIs for applications to store the file in a certain datanode as he likes?

The placement behavior should not be specific to a particular datanode. That's what makes HDFS resilient to failure and also scalable.

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

6楼-- · 2019-01-23 08:43

this image shows how replication process is done[][1]

You can see that when namenode instructs datanode to store data. The first replica is stored in the local machine and other two replicas are made on other rack and so on.

If any replica fails, data is stored from other replica. Chances of failing every replica is just like falling of fan on your head while you were sleeping :p i.e. there is very less chance for that.

0人赞添加讨论(0) 举报

how does hdfs choose a datanode to store

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间