How can I increase the configured capacity of my hadoop DFS from the default 50GB to 100GB?
My present setup is hadoop 1.2.1 running on a centOS6 machine with 120GB of 450GB used. Have set up hadoop to be in psudodistributed mode with the /conf suggested by "Hadoop the Definitive Guide 3'rd).
hdfs-site.xml had only one configured property:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
The following line gave no error feedback... comes back to the prompt.
hadoop dfsadmin -setSpaceQuota 100g /tmp/hadoop-myUserID
If I am in a regen loop (have executed
rm -rf /tmp/hadoop-myUserId
in a attempt to "start from scratch") This seeming success of the setSpaceQuota occurs iff-and-only-if I have executed
start-all.sh
hadoop namenode -format
The failure of my dfs capacity configuration is shown by
hadoop dfsadmin -report
which shows the same 50GB of configured capacity.
I would be willing to switch over to hadoop 2.2 (now stable release) if that is the current best way to get 100GB hdfs configured capacity.
Seems like there should be a configuration property for hdfs-site.xml which would allow me to use more of my free partition.
Set the location of the hdfs to a partition with more free space.
For hadoop-1.2.1 this can be done by setting the hadoop.tmp.dir in
hadoop-1.2.1/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/myUserID/hdfs</value>
<description>base location for other hdfs directories.</description>
</property>
</configuration>
Running
df
had said my _home partition was my hard disk, minus 50GB for my /
( _root) partition. The default location for hdfs is
/tmp/hadoop-myUserId
which is in the / partition. This is where my initial 50GB hdfs size came from.
Creation and confirmation of the partition location of a directory for the hdfs was accomplished by
mkdir ~/hdfs
df -P ~/hdfs | tail -1 | cut -d' ' -f 1
successful implementation was accomplished by
stop-all.sh
start-dfs.sh
hadoop namenode -format
start-all.sh
hadoop dfsadmin -report
which reports the size of the hdfs as the size of my _home partition.
Thank you jtravaglini for the comment/clue.
Stop all the service: stop-all.sh
then add these properties in terms of increasing the storage size in hdfs-site.xml:
<property>
<name>dfs.disk.balancer.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.storage.policy.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode,[disk]file:///hadoop_store2/hdfs/datanode</value>
</property>
also remember to put [disk] for including a extra disk on folder, [ssd] for dedicated extra ssd drive. always remember to check the "///" triple "/" for the directory pointing.
After that,
format the namenode to get the settings inherited in the Hadoop cluster, by giving a command
hadoop namenode -format
then start the services from beginning:
Start-all.sh
"/* remember without formating the hdfs the setting will not be activated as it will search for the Blockpool Id (BP_ID) in dfs.datanode.data.dir, and for the new location it will not found any BP_ID. "/*