I am trying to understand where hadoop stores data in HDFS. I refer to the config files viz: core-site.xml
and hdfs-site.xml
The property that I have set is:
In
core-site.xml
:<property> <name>hadoop.tmp.dir</name> <value>/hadoop/tmp</value> </property>
In
hdfs-site.xml
:<property> <name>dfs.namenode.name.dir</name> <value>file:/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/hadoop/hdfs/datanode</value> </property>
With the above arrangement, like dfs.datanode.data.dir
, the data blocks should be stored in this directory. Is this correct?
I referred to the apache hadoop link, and from that i see this:
core-default.xml
:hadoop.tmp.dir
--> A base for other temporary directories.hdfs-default.xml
dfs.datanode.data.dir
--> Determines where on the local filesystem an DFS data node should store its blocks.The default value for this property being ->
file://${hadoop.tmp.dir}/dfs/data
Since I explicitly provided the value for dfs.datanode.data.dir
(hdfs-site.xml
), does it mean data would be stored in that location? If so, would dfs/data be added to the directory to ${dfs.datanode.data.dir}
, specifically would it become -> /hadoop/hdfs/datanode/dfs/data
?
However I didn't see this directory structure getting created.
One observation that I saw in my env:
I saw that after I run some MapReduce programs, this directory is created viz:
/hadoop/tmp/dfs/data
is getting created.
So, not sure if data gets stored in the directory as suggested by the property dfs.datanode.data.dir.
Does anyone have similar experience?