I know that HDFS is write once and read many times.
Suppose if i want to update a file in HDFS is there any way to do it ?
Thankyou in advance !
I know that HDFS is write once and read many times.
Suppose if i want to update a file in HDFS is there any way to do it ?
Thankyou in advance !
Option1:
If you just want to append to an existing file
echo "<Text to append>" | hdfs dfs -appendToFile - /user/hduser/myfile.txt
ORhdfs dfs -appendToFile - /user/hduser/myfile.txt
and then type the text on the terminal. Once you are done typing then hit 'Ctrl+D'Option2:
Get the original file from HDFS to the local filesystem, modify it and then put it back on HDFS.
hdfs dfs -get /user/hduser/myfile.txt
vi myfile.txt
#or use any other tool and modify ithdfs dfs -put -f myfile.txt /user/hduser/myfile.txt
If you want to add lines, you must put another file and concatenate files:
To modify any portion of a file that is already written you have three options:
Get file from hdfs and modify their content in local
hdfs dfs -copyToLocal /hdfs/source/path /localfs/destination/path
or
hdfs dfs -cat /hdfs/source/path | modify...
Use a processing technology to update as Map Reduce or Apache Spark, the result will appear as a directory of files and you will remove old files. It should be the best way.
Install NFS or Fuse, both supports append operations.
NFS Gateway
Hadoop Fuse : mountableHDFS, helps allowing HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as ‘ls’, ‘cd’, ‘cp’, ‘mkdir’, ‘find’, ‘grep’