File buffer flushing and closing in Python with va

Had an interesting experience with Python's file buffering and wanted to know that I understand it correctly. Given

[Python 2.7 shell]

...
model = (really big Numpy model)
f = open('file.out','w')
pickle.dump(model, f)
(pickle.dump() finishes while I'm doing other things)

[Bash shell]

$ ls -l
-rw-r--r--  1 john  staff  270655488 Dec  6 21:32 file.out

[Return to Python shell]

model = (different really big Numpy model)
f = open('newfile.out','w')
pickle.dump(model,f)
(pickle.dump() finishes)

[Bash shell]

$ ls -l
-rw-r--r--  1 john  staff  270659455 Dec  7 07:09 file.out
-rw-r--r--  1 john  staff  270659451 Dec  6 20:48 newfile.out

Note file.out is now a different size.

Now, I know that Python's file buffer defaults to the system size (I'm on Mac OSX), so it seems that there were still 3,967 bytes in the file buffer while I was screwing around, and the Mac OSX file buffer is greater than that.

What interested me was that I was forcibly reassigning the file object 'f' to another open file without actually calling f.close() (Honestly, I was just working really fast to test something else and forgot). When I looked at the file size, I half expected it to remain the same (which might mean truncating the output)

So, the question is whether this is a safe procedure. Is the file object assignment wrapped in such a way that either the Python garbage collector, or the file object itself, flushes the buffer and closes the file on such a sudden variable re-assignment even if you don't call the close() method? More importantly, is this always the case, or is it possible that the variable re-assignment actually did-- or in another situation might-- truncate that buffer before the file buffer flushed.

I guess it's really a question of how elegant and safe the file objects and Python garbage collector are when yanking objects around without appropriate destruction.

回答1:

As long as your computer does not crash, you won't lose data by not closing a file. Python does indeed close files if the corresponding file objects are garbage collected. In the case you described, the name f was the only reference to the file, so it was closed when you used the name for something else.

Note that it is good practice to close files anyway to free the system ressources allocated by the file object. In some situations you don't know exactly when a file object will be garbage collected -- for example in case of an error, a reference to the file object might be stored in the traceback object, preventing garbage collection. All files are closed when the interpreter exits.

回答2:

pickle dumps by default using is an ASCII format (protocol 0), so the length depends on the actual data. To use a binary dump you can pass -1 as version (note however that this will require the use of "wb" and "rb").