Why is it legit to use no-op to fill gaps between

2019-06-18 22:48发布

I am learning Paxos algorithm (http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf) and there is one point I do not understand.

We know that events follow a timely order, and it happens when, say, events 1-5 and 10 are decided, but 6-9 and 11 thereafter are not yet. In the paper above, it says we simply fill in the gap between 6-9 with no-op values, and simply record new events from 11 and on.

So in this case, since event 10 is already recorded, we know some kinds of events must have happened between 5 and 10 but are not recorded by Paxos due to some failures. If we simply fill in no-op values, these events will lost in our recording.

Even worse, if, as the paper I linked above says, events are in fact commands from the client, then missing a few commands in the middle might make the entire set of operations illegal (if none of the commands can be skipped or the order of them matters).

So why is it still legit for Paxos to fill no-op values for gaps between events? (If the entire set of records might be invalid because of no-op values as I concerned above.) Also, is there a better way to recover from such gaps instead of using no-op?

2条回答
做自己的国王
2楼-- · 2019-06-18 23:13

This is a multi-part answer.

Proposing no-op values is the way to discover commands that haven't got to the node yet. We don't simply fill each slot in the gap with a no-op command: we propose each slot is filled with a no-op. If any of the peers have accepted a command already, it will return that command in the Prepare-ack message and the proposer will use that command in the Accept round instead of the no-op.

For example, assume a node was behind a temporary network partition and was unable to play with the others for slots 6-9. It knows it missed out upon learning the command in slot 10. It then proposes no-ops to learn what was decided in those slots.

Practical implementations also have an out-of-band learning protocol to learn lots of transitions in bulk.

A command isn't a command until it is fully decided; until then it is just a proposed command. Paxos is about choosing between contending commands from multiple clients. Clients must be prepared to have their commands rejected because another client's was chosen instead.

Practical implementations are all about choosing the order of client commands. Their world view is that of a write-ahead log, and they are placing the commands in that log. They retry in the next slot if they're command wasn't chosen. (There are many ways to reduce the contention; Lamport mentions forwarding requests to a leader, such as is done in Multi-Paxos.)

Practical systems also have some means to know if the command is invalid before proposing it; such as knowing a set of reads and a set of writes. This is important for two reasons. First, it's an asynchronous, multi-client system and anything could have changed by the time the client's command has reached the server. Second, if two concurrent commands do not conflict then both should be able to succeed.

查看更多
手持菜刀,她持情操
3楼-- · 2019-06-18 23:24

The system model allows commands (messages) to be lost by the network anyway. If a message is lost, the client is expected to eventually retry the request; so it is fine to drop some of them. If the commands of a client have to executed in client order, then either the client only sends commands synchronously; or the commands have to be ordered at a higher level in the library and kept in some client-session object before being executed.

AFAIK the Zab protocol guarantees client-order, if you don't want to implement that at a higher level.

查看更多
登录 后发表回答