I have cluster of 4 datanodes and hdfs structure on each node is as below
I am facing disk space issue , as you can see the /tmp folder from hdfs has occupied more space(217GB). So i tried to investigate the data from /tmp folder. I found following temp files. I accessed these temp folders each contains some part files of 10gb to 20 gb in size.
I want to clear this /tmp directory. can anyone please let me know the consequences of deleting these tmp folders or part files. Will it affect my cluster?
HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory. These files will be automatically cleared out when mapreduce job execution completes. If you delete this temporary files, it can affect the currently running mapreduce jobs.
Temporary files are created by pig. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then you have to handle this situation. You better handle this temporary files clean up activity in the script itself.
Following article gives you a good understanding
http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/