Hadoop has configuration parameter hadoop.tmp.dir
which, as per documentation, is `"A base for other temporary directories." I presume, this path refers to local file system.
I set this value to /mnt/hadoop-tmp/hadoop-${user.name}
. After formatting the namenode and starting all services, I see exactly same path created on HDFS.
Does this mean, hadoop.tmp.dir
refers to temporary location on HDFS?
It's confusing, but
hadoop.tmp.dir
is used as the base for temporary directories locally, and also in HDFS. The document isn't great, butmapred.system.dir
is set by default to"${hadoop.tmp.dir}/mapred/system"
, and this defines the Path on the HDFS where where the Map/Reduce framework stores system files.If you want these to not be tied together, you can edit your
mapred-site.xml
such that the definition of mapred.system.dir is something that's not tied to${hadoop.tmp.dir}
Had a look around for information on this one. Only thing I could come up with was this post on the Amazon Elastic MapReduce Dev Guide:
Let me add a bit more to kkrugler's answer:
There're three HDFS properties which contain
hadoop.tmp.dir
in their valuesdfs.name.dir
: directory where namenode stores its metadata, with default value${hadoop.tmp.dir}/dfs/name
.dfs.data.dir
: directory where HDFS data blocks are stored, with default value${hadoop.tmp.dir}/dfs/data
.fs.checkpoint.dir
: directory where secondary namenode store its checkpoints, default value is${hadoop.tmp.dir}/dfs/namesecondary
.This is why you saw the
/mnt/hadoop-tmp/hadoop-${user.name}
in your HDFS after formatting namenode.