I have a small query regarding hadoop data writes
From Apache documentation
For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure;
In below image, when the write acknowledge is treated as successful?
1) Writing data to first data node?
2) Writing data to first data node + 2 other data nodes?
I am asking this question because, I have heard two conflicting statements in youtube videos. One video quoted that write is successful once data is written to one data node & other video quoted that acknowledgement will be sent only after writing data to all three nodes.
Step 1: The client creates the file by calling create() method on DistributedFileSystem.
Step 2: DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it.
The namenode performs various checks to make sure the file doesn’t already exist and that the client has the right permissions to create the file. If these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. TheDistributedFileSystem returns an FSDataOutputStream for the client to start writing data to.
Step 3: As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a pipeline, and here we’ll assume the replication level is three, so there are three nodes in the pipeline. TheDataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline.
Step 4: Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.
Step 5: DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.
Step 6: When the client has finished writing data, it calls close() on the stream.
Step 7: This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete The namenode already knows which blocks the file is made up of , so it only has to wait for blocks to be minimally replicated before returning successfully.
Data write operation is considered successful if one replica is successfully written. It is governed by the property dfs.namenode.replication.min in hdfs-default.xml file.
If there is any failure of datanode while writing a replica, the data written is not considered unsuccessful, but under-replicated which while balancing the cluster creates those missing replicas.
Ack packet is independent of the status of data written to datanodes. Even if the data packet is not written the acknowledgement packet is delivered.