How are HDFS files getting stored on underlying OS

2020-02-29 10:42发布

问题:

HDFS is logical filesystem in Hadoop with Block size of 64MB. A file on HDFS is in turn saved on the underlying OS filesystem say ext4 with 4KiB as block size.

To my knowledge, for a file on the local file system, OS uses start and end cylinders of physical hard disk of 4KiB block for its retrieval. As HDFS files are also saved on the ext4 underlying filesystem, the HDFS files are also to be retrieved with the help of 4KiB blocks start and end cylinders only.

If that is the case this won't increase the speed of data retrieval. Now the question is what is the technique used in HDFS wrt hard disk for increasing its retrieval speed?

Thanks in advance

回答1:

The retrieval speed from the ext filesystem isn't changed as you are thinking it very correctly. But what happens is a large file is split into pieces of 64Mb say which are stored on different machines.So when the retrieval call is made, multiple machines read the file pieces simultaneously and report to the main machine (Name node). This way things speed up. It is same like 10 men finishing a building task in 1 day rather than 1 man in 10 days.



标签: hdfs