Assume my HDFS block size is 64 MB.
I have 4 files:
File A: 64MB * 3 + 2 MB;
File B: 62 MB;
There should be 4 blocks for File A each with 64 MB and one with 2 MB.
There should be one block for File B with 62 MB.
So in total there should be 6 blocks
Just because there is "free" space in the one of the blocks of File A which stores only 2 MB, file B does NOT get appended to same block. Is it correct?
I have seen some tutorials where they say the "free" space in the block is utilized.
Correct, there will be 6 blocks. All this means is that a 2MB block only physically takes up 2MB on the datanode's hard disk and not a full block size which would be a waste of space.
File A - 4 blocks. 3 with 64 & 1 with 2MB
File B - 1 block. 62 MB size.
if the replication is 3, there will be (4+1)*3=15 blocks in total.
What they mean in the video it seems is, the HDFS (DataNodes) will again utilize the space in the local file system since it is built on the top of local file system. If a block occupy less than 64MB , the remaining space in the file system is left un occupied. It can be used by some other blocks of some other files.
Assuming you case here, the Block size is 64 MB,
Now you have 3 files with 64 MB each. For the information each block will have 150Kb of metadata information that is generated and is saved in the namenode. even if the block size is 1 MB or 50 MB or 64 MB the metadata information is the same
in your scenario File A : 64 * 3 = 3 Blocks
2MB = 1 Block
Total for FileA = 3+1 = 4 Blocks.
File B = 62 MB = 1Block
Internally how it works :
What we assume is that if a file of 50 Mb is stored, we assume that the rest 14 Mb ( 64-50 =14 MB) goes wasted. but this is not how it works . you should understand that even if the file is > 64 MB the metadata information will be the same. The rest 14 Mb will be used by the other block whose metadata size also will be the same.