How to select policy of block placement in the Dat

2019-09-10 06:07发布

问题:

If the block replication is 3 in my hadoop cluster,and every DataNode has 3 ${dfs.data.dir} directories. When the DataNode is choosed to storage block, the block is storage in all 3 direcoties or one of them?

If the answer is latter, how to choose a ${dfs.data.dir} directory?

回答1:

The right directory is chosen on round robin manner when the block arrives to the datanode. You can alter this behavior by changing dfs.datanode.fsdataset.volume.choosing.policy to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy, then the right directory would be chosen based on the space available in them (refer to configurations here: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml)