I know there are some questions on this website that asks the same questions. However the answer is never clear:
In PBFT, why cant the replicas execute the requests after 2/3s have prepared? why is commit phase needed? if 2/3 + 1 replica have agreed to prepared, then I owuld think they can execute the request without broadcasting again?
(Edited) In addition to previous (incomplete) answer, a quote from from Practical Byzantine Fault Tolerance and Proactive Recovery might help. Note that author claims that Prepare phase is enough for ordering requests in same view, but it is not enough for ordering requests across view changes, so that is why Commit phase is needed.
This ensures that replicas agree on a total order for requests in the same view
but it is not sufficient to ensure a total order for requests across view changes. Replicas may collect prepared certificates in different views with the same sequence number and different requests. The commit phase solves this problem
as follows.
Client's requests should be totally-ordered and should be executed in the exactly same order. Replicas reach consensus on the order of requests in prepare phase by collecting prepare messages by quorum size you mentioned, but doesn't execute it right away in that phase because they have to execute the same request in the same order. (In State Replication Machine system, all the state machines have to deterministically execute the same request in the same order to satisfy safety condition; Execution order affects the state machine's state)
So in commit phase, they have to reach consensus on the execution timing so that they execute the same request in the same time unit for safety condition.
Following your comment "Once the replicas received 2/3 prepared, they can commit", the internal state of each state machines(PBFT's node) would diverge, violating safety condition. That is why commit phase is needed.
Answer to your comment;
Above situation is possible when the replicas execute the request as soon as they get the prepare messages by quorum size. I think the important fact that PBFT assumes partial synchrony; messages can be arbitrarily delayed due to unstable network speed or adversary, but eventually received. So each replica can execute the request message at different time point, and the one example situation is illustrated.
Answer to your second comment
I think I need to elaborate the answer with illustrating coordinated attack of malicious replicas including primary. Let's say n replicas where n = 3f + 1 = 100, f = 33 in Byzantine fault tolerant system. In the system, the system can tolerate f number of Byzantine faulty replica. Now I give an counter-example to answer your question. Consider the following setting;
I partitioned n replicas into three group;
- G1 = {b1, b2, ... , b33} for Byzantine faulty replicas including Byzantine primary(b1), |G1| = 33
- G2 = {r1, r2, ... , r33} for correct replica group, |G2| = 33
- G3 = {r34, r35, ... , r67} for correct replica group, |G3| = 34
Because n = |G1| + |G2| + |G3| = 33 + 33 + 34 = 100, above partition makes sense.
And G1 is entirely controlled in a coordinated way by super-genius hacker who are especially interested in destroy the protocol.
Now, I will demonstrate how above setting violates safety condition if the commit-phase disappears from the protocol; (The safety condition means that the state of G2 and G3 should be the same). For simple description, the consensus value is simplified as a binary value, not the request with sequence number.
- [Pre-Prepare phase]: Primary(b1) sends a 0 value to G2 and 1 value to G3. This situation is not a problem cause we assume Byzantine primary.
[Prepare phase]: Now replicas in G2 and G3 exchange the message from the primary to check if they both have the same message. But, In this phase, replicas from G1 sends a 0 value to G2 and sends a 1 value to G3. After message exchange, the situation is as follows
a. Replicas in G2 received following results; 66 votes for 0 value, 34 votes for 1 value.
b. Replicas in G3 received following results; 33 votes for 0 value, 33+34=67 votes for 1 value.
Because quorum size is 2f+1 = 67, replicas in G3 accepts the proposed value from Byzantine primary who coordinates with Byzantine replicas while replicas in G2 doesn't.
So in the system, even though the system can tolerate up to 33 Byzantine faulty replicas including primary, it immediately fails in your assumption.
Suppose there is no commit phase:
Assume that Replica A prepared (n, m) and then execute m immediately, while other Replicas has not prepared and executed. However, at this time, a view-change occurs, and the new primary cannot communicate with A. Therefore, the new primary will accept the 2f+1 from other nodes, which has not prepared and executed m, stating that their maximum sequence number is n-1.
Therefore, in the new view, the primary may generate (n, m') for the subsequent request m', which is different from the state of A.
PBFT do need the commit phase to ensure that message m is assigned with sequence number n even during view change. This is because if a message m with sequence number n is committed at some replica after 2f+1 COMMITS is received, this (n,m) will be included in NEW-VIEW message thus rebroadcasting to all other nodes by new primary.
This question confused me for quite a long time and today I get it. I will try to explain it clearly but I suggest you be very very familiar with the paper, at least you should know what does "prepared" "commit_local" mean.
Overview of PBFT
The PBFT have three algorithms:
Why commit phase cannot be omitted?
Commit phase is key to safety during view change
From a high level view, the commit phase ensures this invariant:
If commit_local(n,m,v) is true for some replica i, then commit_local(n,m',v+1) is false for any non-fault replica i'.
Prove:
Commit_local(n,m,v) is reached at some replica i means that 2f+1 replica(Q1) declares prepared. This means that there is some non-faulty replica in Q1 votes for View-Change in New-View message. Hence, there is at least one prepared message (n,m) in New-View message. This message broadcasts to all other replicas with New-View message.
For any replica i' receiving New-View, it may have two status:
1) It already commit (n,m)
2) It hasn't commit (n,m) yet. It may be waiting for COMMITS, or PREPARES, or even PRE-PREPARE, whatever.
In the second case, it will reprocess from pre-prepare(n,m) phase for v+1. So (n,m) is kept during view change. In one word, the commit phase will ensure that if (n,m) is committed at some replica, then (n,m) will be included in NEW-VIEW thus it is sent to all other replicas in view-change phase.
What if we omit commit phase?
What if we omit the commit phase? The safety is destroyed because m can be committed with n at some replicas while another m' with n committed at some other replicas.
Suppose a non-faulty replica commits the request after (m,n) is prepared in view v. This means 2f+1 replicas(Q1) declared they receive pre-prepare for (m,n) in view v.
At the same time, it is possible that no other replica is prepared yet since the network is partially synchronized. After some timeout a view change happens.
Now since no replica is prepared, it is possible that some quorum does not send any view change with sequence number >= n, so in new primary max-s < n. Now new primary will assign n with a new message m' from client's new request. Though at least f+1 non-faulty replica in Q1 receives pre-prepare (n,m) in view v, these old pre-prepares cannot prevent new pre-prepare messages (n, m') in view v+1 from being accepted. They are not in same view! Recall the condition of accepting pre-prepare messages!!
Now, (n,m) is committed in some replica while (n,m') is committed in other replicas during view change.
Conclusion: the commit phase ensures that if (n,m) is committed at some replica, then (n,m) will be included in NEW-VIEW thus it is sent to all other replicas in view-change phase.