Is it possible to store the output of the hadoop dfs -getmerge
command to another machine?
The reason is that there is no enough space in my local machine. The job output is 100GB and my local storage is 60GB.
Another possible reason could be that I want to process the output in another program locally, in another machine and I don't want to transfer it twice (HDFS-> local FS -> remote machine). I just want (HDFS -> remote machine).
I am looking for something similar to how scp
works, like:
hadoop dfs -getmerge /user/hduser/Job-output user@someIP:/home/user/
Alternatively, I would also like to get the HDFS data from a remote host to my local machine.
Could unix pipelines be used in this occasion?
For those who are not familiar with hadoop, I am just looking for a way to replace a local dir parameter (/user/hduser/Job-output
) in this command with a directory on a remote machine.