In general, what can we take for granted when we append to a file in UNIX from multiple processes? Is it possible to lose data (one process overwriting the other's changes)? Is it possible for data to get mangled? (For example, each process is appending one line per append to a log file, is it possible that two lines get mangled?) If the append is not atomic in the above sense, then what's the best way of ensuring mutual exclusion?
相关问题
- Is shmid returned by shmget() unique across proces
- Why should we check WIFEXITED after wait in order
- Lazily Reading a File in D
- UNIX Bash - Removing double quotes from specific s
- Are reads/writes of 64-bit values atomic on a 64-b
相关文章
- How to replace file-access references for a module
- Making new files automatically executable?
- Reverse four length of letters with sed in unix
- Why is file_get_contents faster than memcache_get?
- Transactionally writing files in Node.js
- What does it take to be durable on Linux?
- Extracting columns from text file using Perl one-l
- Problem with piping commands in C
A write that's under the size of 'PIPE_BUF' is supposed to be atomic. That should be at least 512 bytes, though it could easily be larger (linux seems to have it set to 4096).
This assume that you're talking all fully POSIX-compliant components. For instance, this isn't true on NFS.
But assuming you write to a log file you opened in 'O_APPEND' mode and keep your lines (including newline) under 'PIPE_BUF' bytes long, you should be able to have multiple writers to a log file without any corruption issues. Any interrupts will arrive before or after the write, not in the middle. If you want file integrity to survive a reboot you'll also need to call
fsync(2)
after every write, but that's terrible for performance.Clarification: read the comments and Oz Solomon's answer. I'm not sure that
O_APPEND
is supposed to have thatPIPE_BUF
size atomicity. It's entirely possible that it's just how Linux implementedwrite()
, or it may be due to the underlying filesystem's block sizes.Here is what the standard says: http://www.opengroup.org/onlinepubs/009695399/functions/pwrite.html.
I wrote a script to empirically test the maximum atomic append size. The script, written in bash, spawns multiple worker processes which all write worker-specific signatures to the same file. It then reads the file, looking for overlapping or corrupted signatures. You can see the source for the script at this blog post.
The actual maximum atomic append size varies not only by OS, but by filesystem.
On Linux+ext3 the size is 4096, and on Windows+NTFS the size is 1024. See the comments below for more sizes.
Edit: Updated August 2017 with latest Windows results.
I'm going to give you an answer with links to test code and results as the author of proposed Boost.AFIO which implements an asynchronous filesystem and file i/o C++ library.
Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.
However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system:
No O_DIRECT/FILE_FLAG_NO_BUFFERING:
Microsoft Windows 10 with NTFS: update atomicity = 1 byte until and including 10.0.10240, from 10.0.14393 at least 1Mb, probably infinite (*).
Linux 4.2.6 with ext4: update atomicity = 1 byte
FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)
O_DIRECT/FILE_FLAG_NO_BUFFERING:
Microsoft Windows 10 with NTFS: update atomicity = until and including 10.0.10240 up to 4096 bytes only if page aligned, otherwise 512 bytes if FILE_FLAG_WRITE_THROUGH off, else 64 bytes. Note that this atomicity is probably a feature of PCIe DMA rather than designed in. Since 10.0.14393, at least 1Mb, probably infinite (*).
Linux 4.2.6 with ext4: update atomicity = at least 1Mb, probably infinite (*). Note that earlier Linuxes with ext4 definitely did not exceed 4096 bytes, XFS certainly used to have custom locking but it looks like recent Linux has finally fixed this.
FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)
You can see the raw empirical test results at https://github.com/ned14/afio/tree/master/programs/fs-probe. Note we test for torn offsets only on 512 byte multiples, so I cannot say if a partial update of a 512 byte sector would tear during the read-modify-write cycle.
So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes on Linux with ext4 unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple.
(*) "Probably infinite" stems from these clauses in the POSIX spec:
and
but conversely:
You can read more about the meaning of these in this answer