Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?
Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?
How does appending to a file in HDFS deal with conccurency? Is there risk of dataloss ?
Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?
According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:
Hadoop Distributed File System supports appends to files, and in this case it should add the 20 MB to the 2nd block in your example (the one with 2 MB in it initially). That way you will end up with two blocks, one with 128 MB and one with 22 MB.
This is the reference to the append java docs for HDFS.
Here is a very comprehensive design document about append and it contains concurrency issues.
Current HDFS docs gives a link to that document, so we can assume that it is the recent one. (Document date is 2009)
And the related issue.