There are several operations which POSIX-compliant operating systems can do atomically with filesystem objects (files and folders). Here is a list of such presumably atomic operations:
- rename or move file or folder
- create hardlink
- create symlink
- create folder
- create and open an empty file
Is it possible to build Compare-and-Swap algorithm for manipulating a file based on these operations?
Let’s suppose we have several processes which are performing concurrent read/write on a single file. A file is characterized by its revision. Let’s say the revision is added to file name, and there is a symlink to the file which can be used by the processes to read it. The processes cannot (for some reasons) synchronize with mutexes, semaphores and so on, but they are able to create auxiliary files and folders. Are they able to perform revision-based Compare-and-Swap modifications of the file (create a new file, create and rename symlink), in the meaning that if several processes are going to modify it simultaneously, one will success and the rest will fail with some error code?
The algorithm has to be resistant to sudden termination of any processes at any step of algorithm.
Oh boy.
Let's assume that each process has access to a unique identifier, to avoid problems breaking symmetry. Here's a wait-free implementation of a one-shot consensus object.
- Create a directory with a unique name.
- Create a file in that directory whose name is the creating process's input.
- Rename the directory to the name of the consensus object. This will fail unless this is the first such rename.
- List the directory to which we tried to rename our own. The name of the file inside is the consensus decision.
Now it's possible to simulate an arbitrary object in a wait-free manner, using standard results in distributed computing. Have fun garbage collecting =P
If you consider fcntl(2) in your list of atomic operations, you can easily build a general mutex primitive. I use the flock(1) command line tool to do this in shell scripts regularly. (flock(1) is part of the util-linux-ng package.)
flock(2) is not specified by POSIX but fcntl(2) is. I think flock(1) may use fcntl(2) in some cases (e.g. NFS).
So the algorithm is something like:
- Do a non-blocking fcntl() on some file that is unique to the resource you want to manipulate. This may be the data file itself, or some empty file that every process agrees to use as the mutex object.
- 2a. If the fcntl is successful, swap the data in the file.
- 2b. If the fcntl is not successful, don't touch the data file.
- Release the fcntl on the file.
You could of course do a blocking fcntl(2), but there won't be any way to know what order each process blocks and gets woken up, so whether this is appropriate depends on the application.
Note that fcntl(2) is advisory, so it won't prevent unwanted manipulation of the data file.