Accessing a file that is being written

2019-02-02 16:32发布

问题:

You use the hadoop fs –put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this file?

a.) They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.
b.) They would see the current state of the file, up to the last bit written by the command.
c.) They would see the current of the file through the last completed block.
d.) They would see no content until the whole file written and closed.

From what I understand about the hadoop fs -put command the answer is D, however some say it is C.

Could anyone provide a constructive explanation for either of the options?

Thanks xx

回答1:

The reason why the the file will not be accessible until the whole file is written and closed (option D) is because, in order to access a file, the request is first sent to the NameNode, to obtain metadata relating to the different blocks that compose the file. This metadata will be written by the NameNode only after it receives confirmation that all blocks of the file were written successfully.

Therefore, even though the blocks are available, the user can't see the file until the metadata is updated, which is done after all blocks are written.



回答2:

As soon as a file is created, it is visible in the filesystem namespace. Any content written to the file is not guaranteed to be visible, however:

Once more than a block's worth of data has been written, the first block will be visible to new readers. This is true of subsequent blocks, too: it is always the current block being written that is not visible to other readers. (From Hadoop Definitive Guide, Coherency Model).

So, I would go with Option C.

Also, take a look at this related question.



标签: hadoop hdfs