I am trying to come up with a mutual exclusion algorithm that is based only on atomic reads and atomic writes of shared memory (i.e. no compare-and-swap or similar).
Apart from mutual exclusion and deadlock freedom, it needs to fulfill the following properties:
- It needs to be robust in the face of processes dying at any point
- It needs to handle an ex-ante unknown number of processes competing for the lock
- Every few seconds I need one of the processes to enter the critical section (I don't care which)
- It should be as memory-efficient as possible
- All my processes share a reasonably synchronized clock-source (sub-second accuracy)
I have come up with a way that looks to me like it could work, but I wanted to run it by you guys to see if I made a mistake somewhere or if there is a more elegant solution.
The idea is to combine a modified version of Szymanski's Three-Bit Linear Wait Algorithm (Figure 1 on page 4 of "Mutual Exclusion Revisited") with a process-naming-scheme based loosely on Moir and Anderson's "Wait-Free Algorithms for Fast, Long-Lived Renaming" (M&A).
As a first step, I define an "order" for incoming processes by assigning them an ID like M&A:
M&A use the following mutex primitive (see Figure 2 on page 5):
where out of n processes entering the block, at most 1 will stop there, at most n - 1 will move right, and at most n - 1 will move down.
These primitives can be linked together into a grid, where I decided to number the boxes differently from M&A, to allow me to calculate the box number without knowing the total number of boxes:
The box number can be calculated using box = (m*(m-1))/2 + r
, where m is the number of total moves made (including the move into the top-left box, i.e. m ≥ 1) and r is the number of moves to the right (r ≥ 0).
Any new process will initially be assigned a large, globally unique ID (e.g. timestamp + huge random number), which it will use as the value for p in the mutex primitive algorithm above. It will then start traversing the grid following the rules of the primitive, starting in the top-left corner, until it stops in some box. The number of the box in which it stopped will now be its new process ID.
To keep the number of process IDs smaller (processes might disappear; and also: boxes can be locked forever without being used if all entering processes move either right or down and none stop there), I will replace the boolean variable Y in the primitive above with a time-stamp.
The line Y := true
will then be replaced with Y := now
, where now is the current time. Similarly, the condition if Y then ...
will be replaced with if (now - Y < timeout) then ...
, where timeout is the time after which a box will be returned to the available pool.
While a process is still running, it needs to keep setting the Y-value of its box to now to never allow its ID to expire. While smaller values of timeout will lead to smaller IDs and less memory use, it can theoretically be chosen arbitrarily large and thus it should always be possible to find a value for timeout that will guarantee that processes will never lose their ID unless they actually quit or died. Values of minutes, hours, or even days should do the trick.
Now, we can use this process order to implement Szymanski's algorithm:
communication variables:: a, w, s: boolean = false
private variables:: j: 0..n
p1 ai=true;
p2 for(j=0;j<n;j++)
while(sj);
p3 wi=true;
ai=false;
p4 while(!si) {
p5 for (j=0;j<n & !aj;j++);
p6 if (j==n) {
si=true;
p6.1 for (j=0;j<n & !aj;j++);
p6.2 if (j<n)
si=false;
p6.3 else {
wi=false;
p6.4 for(j=0;j<n;j++)
while(wj);
}
}
p7 if (j<n)
for (j=0;j<n & (wj | !sj);j++);
p8 if (j!=i & j<n) {
p8.1 si=true;
wi=false;
}
}
p9 for(j=0;j<i;j++)
while(wj | sj);
Critical Section
e1 si=false;
The idea behind this algorithm is a little involved and for brevity (if that isn't already a lost cause at this point), I will not try to paraphrase it again here (Wikipedia also discusses a variant of this algorithm).
To make this algorithm work as needed (i.e. in the face of process aborts and an unknown total number of processes) the following changes can be made:
Just like Y above, the booleans ai, wi, and si will be replaced with timestamps. Except, this time the timeouts need to be short enough to still allow the critical section to be entered as often as needed, in my case every few seconds, so a timeout of 5s might work. At the same time, the timeout needs to be long enough so that it never expires unless a process dies, i.e. it needs to be longer than the worst-case time it takes processes to keep updating the timestamps for flags that need to remain set.
The loops over processes (e.g. for(j=0;j<n;j++)
) will be turned into traversals of the process-ID grid from above in the order of the box-numbers. Each box can have 3 states: Never visited by anyone, a process has stopped in it, or all processes moved on and the box is currently blocked. No distinction needs to be made between the latter two, since a blocked box will simply correspond to a process that is currently not trying to enter the critical section (i.e. ai, wi, and si are false/expired). The loop starts in box 0. If it comes across a box that has been visited (i.e. has its X-value set), it adds the adjacent boxes to its "to-check-list" (i.e. if box 8 was seen, box 12 and 13 will be added to the list). The loop terminates when the "to-check-list" has been processed completely (unless, of course, there is another stopping condition present that fires first (e.g. j < i
or & !aj
)).
Now to my questions:
- Does this look like it would work reliably?
I've never done a formal proof of a framework like this. My gut (and intense pondering) tell me that it should be solid, but I wouldn't mind something a bit more rigorous than that. I'm most worried about the effects of a process dying between p3 and e1 of Szymanski's algorithm and the flags expiring randomly while other processes are in that part of the code. - Is there a better solution (simpler/less resource use)?
One point of attack for an optimization might be the fact that I don't need all processes to eventually make it to the critical section, I just need one (and I don't care which one either).
Sorry for the length of this post and thank you for your endurance reading it! Suggestions for shortening welcome... :)
PS: I also posted this question on the CS stack exchange.