We all know that the block size in HDFS is pretty large (64M or 128M) as compared to the block size in traditional file systems. This is done in order to reduce the percentage of seek time compared to the transfer time (Improvements in transfer rate have been on a much larger scale than improvements on the disk seek time therefore, the goal while designing a file system is always to reduce the number of seeks in comparison to the amount of data to be transferred). But this comes with an additional disadvantage of internal fragmentation (which is why traditional file system block sizes are not so high and are only of the order of a few KBs - generally 4K or 8K).
I was going through the book - Hadoop, the Definitive Guide and found this written somewhere that a file smaller than the block size of HDFS does not occupy the full block and does not account for the full block's space but couldn't understand how? Can somebody please throw some light on this.
According to the Hadoop - The Definitive Guide
Each block in HDFS is stored as a file in the Data Node on the underlying OS file system (ext3, ext4 etc) and the corresponding details are stored in the Name Node. Let's assume the file size is 200MB and the block size is 64MB. In this scenario, there will be 4 blocks for the file which will correspond to 4 files in Data Node of 64MB, 64MB, 64MB and 8MB size (assuming with a replication of 1).
An
ls -ltr
on the Data Node will show the block detailsIn normal file system if we create a blank file, then also it holds the 4k size, as it is stored on the block. In HDFS it won't happen, for 1GB file only 1GB memory is used, not 4 GB. To be more clear.
IN OS : file size 1KB, Block size : 4KB, Mem Used : 4KB, Wastage : 3 KB. IN HDFS : File size 1GB, Block Size: 4GB, Mem Used : 1GB, Wastage : 0GB, Remaining 3 GB are free to be used by other blocks.
*Don't take numbers seriously, they are cooked up numbers to make point clear.
If you have 2 different file of 1GB then there will be 2 blocks of 1 GB each. In file system if you storing 2 files of 1 KB each, then you will be having 2 different files of 4KB + 4KB = 8KB with 6KB wastage.
SO this make HDFS much better than file system. But irony is HDFS uses local file system and in the end it ends up with the same issue.
The block division in HDFS is just logically built over the physical blocks of underlying file system (e.g. ext3/fat). The file system is not physically divided into blocks( say of 64MB or 128MB or whatever may be the block size). It's just an abstraction to store the metadata in the NameNode. Since the NameNode has to load the entire metadata in memory therefore there is a limit to number of metadata entries thus explaining the need for a large block size.
Therefore, three 8MB files stored on HDFS logically occupies 3 blocks (3 metadata entries in NameNode) but physically occupies 8*3=24MB space in the underlying file system.
The large block size is to account for proper usage of storage space while considering the limit on the memory of NameNode.