I would like to edit a text file directly in HDFS using VI without having to copy it to local, edit it and then copy it back from local. Is this possible?
Edit: This used to be possible in Cloudera's Hue UI but is no longer the case.
I would like to edit a text file directly in HDFS using VI without having to copy it to local, edit it and then copy it back from local. Is this possible?
Edit: This used to be possible in Cloudera's Hue UI but is no longer the case.
Other answers here are correct, you can't edit files in HDFS as it is not a POSIX-compliant filesystem. Only appends are possible.
Although recently I had to fix a header in a hdfs file, and that's best I came up with..
This is a Spark (PySpark) code. Notice coalesce(1) so the job is not .. parallel but benefit is that you get only one output file. So then just move/rename file from "orig_file +'_fixed'" directory to overwrite original file.
ps. You could omit .coalesce(1) part and the conversion will run in parallel (assuming big file/multiple splits) and will be much faster, but then you'll have to merge output hdfs files into one.
pps. "map" call in the pipeline fixes the headers through "fix_header" function (not shown here for clarity).
There are couple of options that you could try, which allows you to mount HDFS to your local machine and then you could use your local system commands like cp, rm, cat, mv, mkdir, rmdir, more, etc. But neither of them supports random write operations but supports append operations.
NFS Gateway uses NFS V3 and support appending to file but could not perform random write operations.
And regarding your comment on hue, maybe Hue is downloading the file to a local buffer and after editing it might be replacing the original file in HDFS.
A simple way is to copy from and to hdfs, and edit locally (See here)
Source code of hvim
File in HDFS can be replaced using the -f option in hadoop fs -put -f This will eliminate the need to delete and then copy.
File in HDFS can't be edit directly.Even you can't replace the file in HDFS. only way can delete the file and update the same with new one.
Edit the file in local and copy it again in HDFS. Don't forget to delete the old file if you want to keep same name.