what exactly the python's file.flush() is doin

2019-01-02 15:01发布

问题:

I found this in the Python documentation for File Objects:

flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.

So my question is: what exactly is Python's flush doing? I thought that it forces to write data to the disk, but now I see that it doesn't. Why?

回答1:

There's typically two levels of buffering involved:

  1. Internal buffers
  2. Operating system buffers

The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.

However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.

If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.

So, in order to help with that you have the flush and fsync methods, on their respective objects.

The first, flush, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.

Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been "permanently" stored on disk.

To do that, you need to call the os.fsync method which ensures all operating system buffers are synchronized with the storage devices they're for, in other words, that method will copy data from the operating system buffers to the disk.

Typically you don't need to bother with either method, but if you're in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.


Addendum in 2018.

Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don't really know.



回答2:

Because the operating system may not do so. The flush operation forces the file data into the file cache in RAM, and from there it's the OS's job to actually send it to the disk.



回答3:

It flushes the internal buffer, which is supposed to cause the OS to write out the buffer to the file.[1] Python uses the OS's default buffering unless you configure it do otherwise.

But sometimes the OS still chooses not to cooperate. Especially with wonderful things like write-delays in Windows/NTFS. Basically the internal buffer is flushed, but the OS buffer is still holding on to it. So you have to tell the OS to write it to disk with os.fsync() in those cases.

[1] http://docs.python.org/library/stdtypes.html



回答4:

Basically, flush() cleans out your RAM buffer, its real power is that it lets you continue to write to it afterwards - but it shouldn't be thought of as the best/safest write to file feature. It's flushing your RAM for more data to come, that is all. If you want to ensure data gets written to file safely then use close() instead.



标签: