Does Linux guarantee the contents of a file is flu

2020-01-26 08:48发布

When a file is closed using close() or fclose() (for example), does Linux guarantee that the file is written back to (persistent) disc?

What I mean is, if close() returns 0 and then immediately afterwards the power fails, are previously written data guaranteed to persist, i.e. be durable?

The fsync() system call does provide this guarantee. Is closing a file also sufficient?

I can't find anything which makes any claim one way or another at the moment.


Question 2:

If close() does implicitly do an fsync(), is there a way of telling it not to?

9条回答
狗以群分
2楼-- · 2020-01-26 09:19

No. fclose() doesn't imply fsync(). A lot of Linux file systems delay writes and batch them up, which improves overall performance, presumably reduces wear on the disk drive, and improves battery life for laptops. If the OS had to write to disk whenever a file closed, many of these benefits would be lost.

Paul Tomblin mentioned a controversy in his answer, and explaining the one I've seen won't fit into a comment. Here's what I've heard:

The recent controversy is over the ext4 ordering (ext4 is the proposed successor to the popular ext3 Linux file system). It is customary, in Linux and Unix systems, to change important files by reading the old one, writing out the new one with a different name, and renaming the new one to the old one. The idea is to ensure that either the new one or the old one will be there, even if the system fails at some point. Unfortunately, ext4 appears to be happy to read the old one, rename the new one to the old one, and write the new one, which can be a real problem if the system goes down between steps 2 and 3.

The standard way to deal with this is of course fsync(), but that trashes performance. The real solution is to modify ext4 to keep the ext3 ordering, where it wouldn't rename a file until it had finished writing it out. Apparently this isn't covered by the standard, so it's a quality of implementation issue, and ext4's QoI is really lousy here, there being no way to reliably write a new version of configuration files without constantly calling fsync(), with all the problems that causes, or risking losing both versions.

查看更多
霸刀☆藐视天下
3楼-- · 2020-01-26 09:23

No, close does not perform an fsync(2) and would batter many machines to death if it did so. Many intermediate files are opened and closed by their creator, then opened and closed by their consumer, then deleted, and this very common sequence would require touching the disk if close(2) performed an automatic fsync(2). Instead, the disk is usually not touched and the disk never knows the file was there.

查看更多
老娘就宠你
4楼-- · 2020-01-26 09:30

From "man 2 close":

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes.

The man page says that if you want to be sure that your data are on disk, you have to use fsync() yourself.

查看更多
淡お忘
5楼-- · 2020-01-26 09:38

You may be also interested in this bug report from the firebird sql database regarding fcntl( O_SYNC ) not working on linux.

In addition, the question you ask implies a potential problem. What do you mean by writing to the disk? Why does it matter? Are you concerned that the power goes out and the file is missing from the drive? Why not use a UPS on the system or the SAN?

In that case you need a journaling file system - and not just a meta-data journaling file system but a full journal even for all the data.

Even in that case you must understand that besides the O/S's involvment, most hard disks lie to you about doing an fsync. - fsync just sends the data to the drive, and it is up to the individual operating system to know how to wait for the drive to flush its own caches.

--jeffk++

查看更多
我欲成王,谁敢阻挡
6楼-- · 2020-01-26 09:39

It is also important to note that fsync does not guarantee a file is on disk; it just guarantees that the OS has asked the filesystem to flush changes to the disk. The filesystem does not have to write anything to disk

from man 3 fsync

If _POSIX_SYNCHRONIZED_IO is not defined, the wording relies heavily on the conformance document to tell the user what can be expected from the system. It is explicitly intended that a null implementation is permitted.

Luckily, all of the common filesystems for Linux do in fact write the changes to disk; unluckily that still doesn't guarantee the file is on the disk. Many hard drives come with write buffering turned on (and therefore have their own buffers that fsync does not flush). And some drives/raid controllers even lie to you about having flushed their buffers.

查看更多
霸刀☆藐视天下
7楼-- · 2020-01-26 09:40

We wouldn't have to care about this if computer/OS had a fault tolerant file system that would guaranteed write to something that survived a power cycle for at least files we put this constraint upon. It doesn't have to be disk if there is some non-volatile RAM or equivalent. Some mainframes of a bygone age I dimly remember did have such mechanisms and supposedly did make such guarantees.

查看更多
登录 后发表回答