In RAFT is it possible to have a majority consensu

2019-05-12 20:05发布

Consider this simulation in the official raft webpage

enter image description here

Why is term 2 index 1 not committed despite S2 (leader) , S3 and S4 agreeing on the log? I run this multiple minutes to make sure all communication has taken place.

Weirdly enough, if I add one more log entry term 6 index 2 then term 2 index 1 will be committed.

Does anyone know what is the rule that is preventing term 2 index 1 to be committed?

1条回答
别忘想泡老子
2楼-- · 2019-05-12 20:44

Your leader is in term 6, but none of the log entries are from term 6; this invokes a special rule in Raft. The leader does not auto-commit an entry from a prior term because in some situations it is unsafe to do so. Sections 5.3 and 5.4 talk about this in length (See also Figure 8).

From section 5.4.2:

To eliminate problems like the one in Figure 8, Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas; once an entry from the current term has been committed in this way, then all prior entries are committed indirectly because of the Log Matching Property. There are some situations where a leader could safely conclude that an older log entry is committed (for example, if that entry is stored on every server), but Raft takes a more conservative approach for simplicity

Your example is perfectly set up to show why this is unsafe. Let's assume that S2 does commit and then we'll break it by committing two things into the same slot.

  1. S2 commits slot 1 (locally).
  2. S2 sends AppendEntries(commitIndex=1, []).
  3. S3 receives and applies AppendEntries(commitIndex=1).
    • 2 is now committed on two hosts.
    • The other hosts do not receive the message.
  4. S1 is elected leader
    • S1 is 'more up-to-date' than any other log (§5.4.1) and easily wins an election.
  5. S1 sends AppendEntries([4]).
    • The first thing a leader does is make all the other logs look like its own.
  6. S4 receives and applies AppendEntries([4]).
    • This overwrites its value 2 at slot 1.
  7. S5 receives and applies AppendEntries([4]).
  8. S1 commits 4 locally
    • We've broken it!! Two committed values
  9. S2,S3 receive and apply AppendEntries([4]).
    • We've doubly broken it, we've lost committed data!!
    • A good engineer would have put an assertion here to catch this overwrite.

What happened? In essence, we elected two different leaders for the same slot. (S2 was elected leader when S1 was more-to-date.)

Why then is it safe for a leader to commit an entry of its own term without waiting for a follow-up request? Because it is impossible to get into the above situation. Let's consider the case where two nodes (S2, S1) think they are the leader simultaneously in terms 2 and 3 respectively. If S2 is ready to commit 2 into slot 1 then a majority have the same value there. This means no majority had voted for another anything with a higher term in slot 1. This means S1, to be elected leader for term 3, must have 2 in slot 1.

(BTW, it took me a minute to figure out how you got into this situation in the first place.)

As an aside, Raft is marketed as a "more understandable version of Paxos". I disagree: it seems to have just as many (if not more) corner cases such as these. But, the Raft author is really good at making it easy for an engineer to make a correct implementation of something practical. This has everything to do with how the author wrote the Raft paper.

查看更多
登录 后发表回答