Opening a file stored in HDFS to edit in VI

2019-04-05 08:10发布

问题:

I would like to edit a text file directly in HDFS using VI without having to copy it to local, edit it and then copy it back from local. Is this possible?

Edit: This used to be possible in Cloudera's Hue UI but is no longer the case.

回答1:

There are couple of options that you could try, which allows you to mount HDFS to your local machine and then you could use your local system commands like cp, rm, cat, mv, mkdir, rmdir, more, etc. But neither of them supports random write operations but supports append operations.

  • NFS Gateway
  • Hadoop Fuse

NFS Gateway uses NFS V3 and support appending to file but could not perform random write operations.

And regarding your comment on hue, maybe Hue is downloading the file to a local buffer and after editing it might be replacing the original file in HDFS.



回答2:

A simple way is to copy from and to hdfs, and edit locally (See here)

hvim <filename>

Source code of hvim

hadoop fs -text $1>hvim.txt
vim hvim.txt
hadoop fs -rm -skipTrash $1
hadoop fs -copyFromLocal hvim.txt $1
rm hvim.txt


回答3:

File in HDFS can be replaced using the -f option in hadoop fs -put -f This will eliminate the need to delete and then copy.



回答4:

File in HDFS can't be edit directly.Even you can't replace the file in HDFS. only way can delete the file and update the same with new one.

Edit the file in local and copy it again in HDFS. Don't forget to delete the old file if you want to keep same name.



回答5:

Other answers here are correct, you can't edit files in HDFS as it is not a POSIX-compliant filesystem. Only appends are possible.

Although recently I had to fix a header in a hdfs file, and that's best I came up with..

sc.textFile(orig_file).map(fix_header).coalesce(1).saveAsTextFile(orig_file +'_fixed')

This is a Spark (PySpark) code. Notice coalesce(1) so the job is not .. parallel but benefit is that you get only one output file. So then just move/rename file from "orig_file +'_fixed'" directory to overwrite original file.

ps. You could omit .coalesce(1) part and the conversion will run in parallel (assuming big file/multiple splits) and will be much faster, but then you'll have to merge output hdfs files into one.

pps. "map" call in the pipeline fixes the headers through "fix_header" function (not shown here for clarity).