I'm trying to setup an Hadoop 3 cluster.
Two questions about the Erasure Coding feature :
- How I can ensure that erasure coding is enabled ?
- Do I still need to set the replication factor to 3 ?
Please indicate the relevant configuration properties related to erasure coding/replication, in order to get the same data security as Hadoop 2 (replication factor 3) but with the disk space benefits of Hadoop 3 erasure coding (only 50% overhead instead of 200%).
In Hadoop3 we can enable Erasure coding policy to any folder
in HDFS. By default erasure coding is not enabled in Hadoop3, you can enable it by using setPolicy
command with specifying desired path of folder.
1:
To ensure erasure coding is enabled, you can run getPolicy
command.
2:
In Hadoop3 Replication factor setting will affect only to other folders which is not configured by erasure code setPolicy. You can use both Erasure coding and replication factor settings in single cluster.
Command to List the supported erasure policies:
./bin/hdfs ec -listPolicies
Command to Enable XOR-2-1-1024k Erasure policy:
./bin/hdfs ec -enablePolicy -policy XOR-2-1-1024k
Command to Set Erasure policy to HDFS directory:
./bin/hdfs ec -setPolicy -path /tmp -policy XOR-2-1-1024k
Command to Get the policy set to the given directory:
./bin/hdfs ec -getPolicy -path /tmp
Command to Remove the policy from the directory.i.e unset policy:
./bin/hdfs ec -unsetPolicy -path /tmp
Command to Disable policy:
./bin/hdfs ec -disablePolicy -policy XOR-2-1-1024k