-->

some questions about paxos

2019-09-11 01:26发布

问题:

i am confused by the value chose by proposer. use a example to explain. If now a proposer wants to lock a file, then it will send that l1 is the processer_number, and v1 is the value of "lock the file", and acceptors accept it. than the proposer wants to unlock the file, and sends (l2 > l1)that v2 is the value of "unlock the file", after that, acceptor return the last value and proposer picks it and send again.

in this example, v2 is lost? or what is the real process in this example? also, these are two rounds or one round? how to deal with the round?

回答1:

Paxos is not an atomic register; once a value is chosen by Paxos, it CANNOT change.

First, notice your lock is a finite state machine:

          on_lock
       .-----------.
       |           |
+------+---+    +--v-----+
| UNLOCKED |    | LOCKED |<--- start
+------^---+    +--+-----+
       |           |
       `-----------'
         on_unlock

Paxos could be used to decide a sequence of transitions; but each new transition must be decided in a new Paxos instance.

I suggest taking a look at some of the other paxos questions around StackOverflow:

  • paxos value choice
  • implementation of Paxos algorithm
  • Questions about Paxos implementation


回答2:

We need to describe a lock protocol that we can use Paoxs to run in order to understand when multiple values are available and to see the outcome of choosing a value.

The lock is a cell holding a lock token value in the format L={T,P}, where P is the process holding the lock and T is the time the process took the lock. To acquire the lock a client sends V={Lc,Ln} where Lc is what it thinks is the current lock token and Ln={T',P'} is the new lock token it wants to set. A special token Nil can be sent to unlock the lock. The new token is not set if the message does not state a correct Lc that matches the current token. This compare-and-swap (CAS) prevents a delayed unlock message from being mistakenly applied. It also works to timeout a lock when clients can steal it; if two processes send two racing messages {L1,L2} and {L1,L3} only one can succeed. A process learns if it's operation succeeded by inspecting the returned value which is the lock value L. A process can query the lock value by sending {Nil,Nil} which is a "do nothing" CAS that unlocks an open lock; but which returns who owns the lock if the lock is closed.

Writes must pass through the leader. If a node knows it is not the leader it should redirect the client to the leader. If node does not know who is leader it should throw an error and the client should pick another node at random. If the node thinks it is the leader it can only respond to the client when it is sure that a majority of nodes have accepted the new value. This is because Paxos ensures that a value accepted by a majority has been made durable by the cluster. If a node was leading, then does not hear a majority of acceptances, it cannot respond to the client. It may be isolated from the other nodes. The other nodes may have a newly elected leader. This also applies to {Nil,Nil} queries which need a majority of acceptances to confirm the leader is still the leader to tell the client the current lock value. Eventually the node should hear if a new leader exists else timeout trying to get the value accepted by a majority. It should then either redirect the client to the new leader else return an error to the client.

Now we can consider multiple values during leader failover. Client A sends a valid CAS update V1 which should succeed to the leader node X of a three node cluster. Node X sends accept(N1,V1) to itself and nodes Y and Z. It accepts its own value and also gets an acceptance from Y but the network drops the message to Z. Then node X goes dark and stops issuing any messages for a while. It may be dead or stalled but we don't yet know. It had seen a majority of X,Y but is now a mysterious Schrödinger's cat either dead or alive until we see another message from it to know its fate. This is where Paxos chooses to use collaboration to get a consistent and correct outcome no matter what happens next.

After a while node Z times out as it has not heard from a leader in too long. It issues propose(N2) to itself and the other nodes. It gets back promise(N2,V1) from node Y and promise(N2,empty) from itself. It has a majority Y,Z and can lead. Only node X knows that value V1 had been accepted by a majority and whether the client has been told it's CAS succeeded; but it is silent. Node Z has to make a conservative choice. If it were to assume X is dead it may be wrong: node X may be alive and may have told the client the operation succeeded. Node Z must collaborate and finish the partial work of the last leader by leading with V1 as its first value. So it sends accept(N2,V1) to all three nodes. Now it does not matter if node X is dead or alive or had told the client the operation succeeded or not. In all cases the locking protocol will not be violated and the client will find out it has the lock eventually if it retries upon error; it does not see nor care what proposal nor which node committed the work.



标签: paxos