Is file append atomic in UNIX?

2019-01-02 17:02发布

In general, what can we take for granted when we append to a file in UNIX from multiple processes? Is it possible to lose data (one process overwriting the other's changes)? Is it possible for data to get mangled? (For example, each process is appending one line per append to a log file, is it possible that two lines get mangled?) If the append is not atomic in the above sense, then what's the best way of ensuring mutual exclusion?

4条回答
裙下三千臣
2楼-- · 2019-01-02 17:40

A write that's under the size of 'PIPE_BUF' is supposed to be atomic. That should be at least 512 bytes, though it could easily be larger (linux seems to have it set to 4096).

This assume that you're talking all fully POSIX-compliant components. For instance, this isn't true on NFS.

But assuming you write to a log file you opened in 'O_APPEND' mode and keep your lines (including newline) under 'PIPE_BUF' bytes long, you should be able to have multiple writers to a log file without any corruption issues. Any interrupts will arrive before or after the write, not in the middle. If you want file integrity to survive a reboot you'll also need to call fsync(2) after every write, but that's terrible for performance.

Clarification: read the comments and Oz Solomon's answer. I'm not sure that O_APPEND is supposed to have that PIPE_BUF size atomicity. It's entirely possible that it's just how Linux implemented write(), or it may be due to the underlying filesystem's block sizes.

查看更多
一个人的天荒地老
3楼-- · 2019-01-02 17:46

Here is what the standard says: http://www.opengroup.org/onlinepubs/009695399/functions/pwrite.html.

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

查看更多
余生无你
4楼-- · 2019-01-02 17:53

I wrote a script to empirically test the maximum atomic append size. The script, written in bash, spawns multiple worker processes which all write worker-specific signatures to the same file. It then reads the file, looking for overlapping or corrupted signatures. You can see the source for the script at this blog post.

The actual maximum atomic append size varies not only by OS, but by filesystem.

On Linux+ext3 the size is 4096, and on Windows+NTFS the size is 1024. See the comments below for more sizes.

查看更多
君临天下
5楼-- · 2019-01-02 17:59

Edit: Updated August 2017 with latest Windows results.

I'm going to give you an answer with links to test code and results as the author of proposed Boost.AFIO which implements an asynchronous filesystem and file i/o C++ library.

Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.

However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system:


No O_DIRECT/FILE_FLAG_NO_BUFFERING:

Microsoft Windows 10 with NTFS: update atomicity = 1 byte until and including 10.0.10240, from 10.0.14393 at least 1Mb, probably infinite (*).

Linux 4.2.6 with ext4: update atomicity = 1 byte

FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)

O_DIRECT/FILE_FLAG_NO_BUFFERING:

Microsoft Windows 10 with NTFS: update atomicity = until and including 10.0.10240 up to 4096 bytes only if page aligned, otherwise 512 bytes if FILE_FLAG_WRITE_THROUGH off, else 64 bytes. Note that this atomicity is probably a feature of PCIe DMA rather than designed in. Since 10.0.14393, at least 1Mb, probably infinite (*).

Linux 4.2.6 with ext4: update atomicity = at least 1Mb, probably infinite (*). Note that earlier Linuxes with ext4 definitely did not exceed 4096 bytes, XFS certainly used to have custom locking but it looks like recent Linux has finally fixed this.

FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)


You can see the raw empirical test results at https://github.com/ned14/afio/tree/master/programs/fs-probe. Note we test for torn offsets only on 512 byte multiples, so I cannot say if a partial update of a 512 byte sector would tear during the read-modify-write cycle.

So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes on Linux with ext4 unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple.


(*) "Probably infinite" stems from these clauses in the POSIX spec:

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links ... [many functions] ... read() ... write() ... If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. [Source]

and

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. [Source]

but conversely:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control. [Source]

You can read more about the meaning of these in this answer

查看更多
登录 后发表回答