If the replication factor is changed in the cluster,say, from 5 to 3 and the cluster is restarted, what happens to the old file blocks? Will they be considered as over replicated and get deleted or replication factor is applicable to only new files? Which means old file blocks are replicated 5 times and the new file blocks (after restart) are replicated 3 times. What happens if the cluster is not restarted?
问题:
回答1:
If the replication factor is changed in the cluster,say, from 5 to 3 and the cluster is restarted, what happens to the old file blocks?
Nothing happens to existing/old file blocks.
Will they be considered as over replicated and get deleted or replication factor is applicable to only new files?
The new replication factor will only apply to new files, as replication factor is not a HDFS-wide setting but a per-file attribute.
Which means old file blocks are replicated 5 times and the new file blocks (after restart) are replicated 3 times.
Its the invert of this. Existing files with replication factor set to 3 will continue to carry 3 blocks. New files created with a higher default replication factor will carry 5 blocks.
What happens if the cluster is not restarted?
Nothing happens if you do restart or don't restart your cluster. Since the property is per-file and is guided by clients when creating a file, a cluster restart isn't required to change this config either. You only need to update your client configs.
If you look to change all your old files' replication factor, consider running the replication changer command: hadoop fs -setrep -R 5 /
回答2:
If you change the replication factor in the config file and restart the cluster: the old file blocks continues keeping 5 copies. That's because, if you change the replication factor in the config file, it only applies to the new files, which are yet to arrive.
To make sure your replication factor is reduced to 3 from 5 and the overly-replicated blocks are deleted, you need to use the setrep command of the hadoop fs/dfs utility: hdfs dfs -setrep -R 5 /