paxos value choice

2019-04-10 08:52发布

I've read about paxos on wiki page and the paper (paxos made simple). However, I'm still confused by some details:

  1. In phase 1a, does the proposer include the value that it intends to choose in the proposal to acceptors?

  2. In phase 1b, acceptor is supposed to return the value that it accepted previously if any. what is the life time of the value? IOW, when is it considered accepted and when does it get overwritten/dropped?

Some updates about the life time question. IIUC, after the first acceptance, an acceptor should always have a previously accepted value at hand. How does it decide if it should return it in the next promise (1b) phase? Or when does it decide to forget the value?


Updates 2 to better discuss with @MichealDeardeuff:

I have bellow understanding for paxos:

Normally in a paxos workflow, an acceptor should always have a previously accepted value at hand. And when answering a prepare request, it will send the value back in the promise response. And the proposer need to check if the value is the same value as itself proposed in the last round. If it is not, the proposer proceeds to send accept request with the value returned by acceptor. If it is, the proposer then proceeds to send Accept request with the value it intended to send in current round.

Is above understanding correct?

If it is not correct, would you please explain why?

If it is correct, I am confused because the paxos protocol doesn't explicitly say so. It only states:

where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.

However, to my understanding, the proposer need to check and see if the value returned by acceptor is the same value as the same proposer proposed in the last round. If it is, even though there is a value with the highest-numbered proposal among the promise responses, the proposer can still choose any value it wants as if there were no values returned by acceptors.

And that is the reason that I want to see if there is any reference to support the understanding.

Thanks very much!

标签: cloud paxos
1条回答
Rolldiameter
2楼-- · 2019-04-10 08:54

In phase 1a, does the proposer include the value that it intends to choose in the proposal to acceptors?

The proposer doesn't need to send the value it intends to choose until phase 2a. See also my answer to an earlier question.

In phase 1b, acceptor is supposed to return the value that it accepted previously if any. what is the life time of the value? IOW, when is it considered accepted and when does it get overwritten/dropped?

From my answer to another different question, "An acceptor should accept all values unless it has promised to not accept it." That is, it should accept any value with a round-id greater than or equal to the round-id in the last response it sent.

The reason it must accept any other value is that a proposer may discover (as a result of phase 1) that a different value should be—or already has been—chosen; it then broadcasts this value to all the acceptors. This discrepancy can happen when there are multiple proposers and/or during a network partition.


Update to answer the comment.

When an acceptor accepts a value, it holds on to that value until it accepts another value. (along with this it stores the round of the value).

In Phase 1b, the Acceptor always sends its value, allowing the Proposer to sort out what the actual chosen value is. It never forgets its current value. Ever.

After receiving the promises from the acceptors, the proposer has a nice view of the system. (Note: it is not necessarily a complete or up-to-date view.)

It could be that different acceptors have accepted different values due to some contention with a different proposer. So, the proposer chooses the value with the highest round id and sends that value to all the acceptors. This is why acceptors values are allowed to change.

The concern in the comments is that this breaks Paxos. Not so.

But, before I get into an example, let me stress that it's much easier to think about Paxos with named phases and messages instead of 1a, 1b, 2a, 2b. I typically name the phases the Prepare phase and the Accept phase. With message 1a as Prepare, 1b as Promise, 2a as Accept, and 2b as Accepted.

Now, let's take a hypothetical example that people often give, and which they think breaks Paxos. Suppose we have three Acceptors A1, A2, and A3. A1 and A2 have both accepted value ABC at round 1 and A3 has chosen XYZ at round 2 (ie. from a different proposer). We can see that A1 and A2 form a majority and that ABC has been "chosen."

Continuing along this hypothetical example, a proposer sends Prepare(3) and receives back responses from A2 and A3, viz Promise(ABC @ 1) and Promise(XYZ @ 2). The Proposer sees XYZ has the highest round, and sends that along in the Accept phase, overwriting ABC on the other hosts. And viola, Paxos is broken, Right?

No. The problem is with the start state, which is impossible. Let me show you why.

First, some propositions, which are key to Paxos running correctly:

Proposition A: For A1 and A2 to have the value ABC @ 1, a proposer must have sent Accept(ABC @ 1) which means it must have received a majority of Promises in response to sending Prepare(1).

Proposition B: For A3 to have the value XYZ @ 2, a proposer must have sent Accept(XYZ @ 2) which means it must have received a majority of Promises in response to sending Prepare(2).

And now the proof:

Case 1: A1 and A2 already have values ABC @ 1. By Proposition B, for A3 to have value XYZ @ 2 it must have received a majority of Promises from the acceptors. Since value ABC is on a majority of acceptors, then at least one of them must have sent Promise(ABC @ 1). With 1 being the highest round id, the proposer must have select the value ABC and sent Accept(ABC @ 2). But it didn't, it sent Accept(XYZ @ 2). A contradiction.

Case 2: A3 already has the value XYZ @ 2. (Remember, round 2 can come before round 1: the round id's are only monotonically increasing per proposer.) By Proposition A, for A1 and A2 to have value ABC @ 1, a proposer must have received a majority of Promises from the acceptors. Likewise, by Proposition B, for A3 to have it, too, must have received a majority of Promises. That means there must have been at least one acceptor in the set {A1, A2} that Promised twice.

Case 2a: The acceptor sent Promise(1) first, then Promise(2). Then when it received the message Accept(ABC @ 1), it must have rejected it because it has promised to accept no value earlier than 2. But it didn't reject it because it has the value ABC @ 1. A contradiction.

Case 2b: The acceptor sent Promise(2) first. Then, when it received the message Prepare(1), it must have rejected it because it had already promised for a higher round. But it clearly didn't because, by proposition A, it sent Promise(1). A contradiction.

Wow!

I hope this helps you out.


Update 2: Here's a pseudo-python implementation of Paxos

from itertools import count
from concurrent import futures



class Acceptor():
    def __init__(self):
        self.acceptedValue = None
        self.acceptedRound = None

        self.promisedRound = None

    def onPrepare(self, message):
        if self.promisedRound > message.round:
            send(message.source, new Nack())

        self.promisedRound = message.round
        response = new Promise(round=self.acceptedRound, value=self.acceptedValue)
        send(message.source, response)

    def onAccept(self, message):
        if self.promisedRound > message.round:
            send(message.source, new Nack())

        self.acceptedValue = message.value
        self.acceptedRound = message.round
        send(message.source, new Accepted())




class Proposer():

    def __init__(self, selfId, quorum):
        self.quorum = quorum
        self.id = selfId

        # get a unique starting round, 
        # assumes acceptors and proposers are on same hosts
        self.initialRound = sorted(quorum).index(selfId)

    def propose(self, value):
        "Propose a value to be chosen; returns actual value chosen"

        # round is always unique to a proposer
        # count(start, step) returns an infinite sequence
        for round in count(self.initialRound, len(self.quorum))
            promises = self.prepare(round)

            if not promises: continue

            # interpret the prepare phase results, may be None
            selectedValue = max(promises, key=lambda p: p.round or -1)

            accepted = self.accept(round, selectedValue or value)

            if accepted: return value

    def prepare(self, round):
        "Drive the Prepare phase. returns the promises from the acceptors or None"

        message = new Prepare(round, source=self.id)
        responses = []
        for acceptor in self.quorum:
            responses.append(send(acceptor, message)))

        # wait for the results
        promises = []
        for response in futures.as_completed(responses):
            if response.exception(): continue # message died

            message = response.result()
            if not isinstance(message, Promise):
                # A nack message; abort the round. We may have got a majority,
                # but this speeds up the case when we didn't.
                return None

            promises.append(message)
            if (len(promises) > len(self.quorum) / 2:
                return promises

        # No majority of responses
        return None

    def accept(self, round, value):
        "Drive the Accept phase. returns True iff the value was accepted"

        message = new Accept(round, value)
        responses = []
        for acceptor in self.quorum:
            responses.append(send(acceptor, message)

        # wait for the results
        accepts = 0
        for response in futures.as_completed(responses):
            if response.exception(): continue # message died

            message = response.result()
            if not isinstance(message, Accepted):
                # A nack message; abort the round. We may have got a majority,
                # but this speeds up the case when we didn't.
                return False

            accepts = accepts + 1
            if accepts > len(self.quorum) / 2:
                return True

        # No majority of responses
        return False

Update 3 To answer more questions from the comments.

…if it is the same value proposer sent in last round, we want proposer to ignore it (otherwise we end up in a dead loop, right?)

(I’m assuming that "ignoring" the value means exiting the loop early.)

First, please notice the loop ends when the proposer receives acknowledgement from a majority of the acceptors that a value has been chosen. So, if a proposer finds itself doing a subsequent Prepare phase, you can be sure that it is contending with another proposer OR some network partition happened. Second, please notice the prepare phase returns a value even if only one acceptor has accepted a value (meaning the value may not have been accepted by a majority.)

If the Prepare phase results in a value and it’s the same value it saw in the previous round, it should carry on with the Accept phase anyway for several reasons.

  • If the proposer quits the loop early, It may be that the other proposer has failed. This may result in no value being chosen.
  • If it exits the loop early, it is reasonable the other proposer, following the same algorithm has exited the loop too; again resulting in possibly no value being chosen.
  • If it so happens that the value actually had been chosen, it may be that not all of the nodes have received the value (due to network partition) or have a different value (due to that fight with the other proposer). It is good, then, to push the value to the nodes for durability sake.

On the other hand, it is possible for Paxos to go into an infinite loop when there is contention between two or more proposers and some bad luck. (This is true of any consensus algorithm and was proven by Liskov et al before any consensus algorithm was actually discovered.) So the theory says, but in practical systems it is easy to get around: on each subsequent round, pause for a random amount of time to let the other proposer have the chance of winning the race. Sure, it is still possible to have an infinite loop, but it unlikely to ever happen in the lifetime of the universe.

Do you have any reference to support the argument?

I speak mostly from my own learning and experience. I am on a team that develops and maintains a system built with Paxos at the core. I’ve also done extensive study on the subject.

Here are some good papers on the subject:


Update 4 Answering updates in the question.

Normally in a paxos workflow, an acceptor should always have a previously accepted value at hand. And when answering a prepare request, it will send the value back in the promise response. And the proposer need to check if the value is the same value as itself proposed in the last round. If it is not, the proposer proceeds to send accept request with the value returned by acceptor.

Okay, so far. But the proposer doesn't need to keep state between rounds. We'll soon find out why.

If it is [the same value], the proposer then proceeds to send Accept request with the value it intended to send in current round.

If any value is returned by an acceptor, the one with the highest round-id must be used in the Accept phase. If no value is returned by an acceptor, than any value may be used: my logic suggests this would be the value the client passed to the proposer.

Compare this with Paxos Made Simple (pg 6):

...where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.

(It’s kind of hard that terminology switches between the papers. Here Lamport uses the term numbered proposals, elsewhere they he uses the term round-id. There are the one and the same.)

Suppose a proposer runs a prepare phase and sees this state among the acceptors.

             A1   A2  A3
promise:      3    ?   4
value@round:  x@3  ?  y@4

where the actual state is

              A1  A2  A3
promise:       3   4   4
value@round:  x@3 y@4 y@4

Now, suppose it had run the previous round with the value y. IF it is allowed to choose any value at this point, it could force this state:

              A1  A2  A3
value@round:  z@5 z@5 z@5

This overwrites the chosen value, violating the consistency property of consensus (Viz, At most one value can be chosen.).

If you want this kind of behavior, you cannot use a consensus protocol as is. You can use protocols like ABD, or other round-based register. Or you can think of Paxos as choosing a transition in a state machine. In other words, each time you want to change a variable you run a new Paxos algorithm to choose the transition. This is what my system does (and I dare say most practical Paxos systems).

Note, the ABD and round-based registers act similarly to Paxos, but are a little simpler because they don't have to guarantee the above consistency property (once chosen, always chosen). But a Paxos-based state machine allows for CAS operations on the variables.

查看更多
登录 后发表回答