I am trying to understand the trade offs/differences between these to ways of opening files for line-by-line processing
with open('data.txt') as inf:
for line in inf:
#etc
vs
for line in open('data.txt'):
# etc
I understand that using with
ensures the file is closed when the
"with-block" (suite?) is exited (or an exception is countered). So I have been using with
ever since I learned about it here.
Re for
-loop: From searching around the net and SO, it seems that whether the file
is closed when the for
-loop is exited is implementation dependent? And
I couldn't find anything about how this construct would deal with
exceptions. Does anyone know?
If I am mistaken about anything above, I'd appreciate corrections,
otherwise is there a reason to ever use the for
construct over the
with
? (Assuming you have a choice, i.e., aren't limited by Python version)
The problem with this
Is that you don't keep an explicit reference to the open file, so how do you close it? The lazy way is wait for the garbage collector to clean it up, but that may mean that the resources aren't freed in a timely manner.
So you can say
Now what happens if there is an exception while you are inside the for loop? The file won't get closed explicitly.
Add a
try/finally
This is a lot of code to do something pretty simple, so Python added
with
to enable this code to be written in a more readable way. Which gets us to hereSo, that is the preferred way to open the file. If your Python is too old for the with statement, you should use the
try/finally
version for production codeThe with statement was only introduced in Python 2.5 - only if you have backward compatibility requirements for earlier versions should you use the latter.
Bit more clarity
The with statement was introduced (as you're aware) to encompass the try/except/finally system - which isn't terrific to understand, but okay. In Python (the Python in C), the implementation of it will close open files. The specification of the language itself, doesn't say... so IPython, JPython etc... may choose to keep files open, memory open, whatever, and not free resources until the next GC cycle (or at all, but the CPython GC is different from the .NET or Java ones...).
I think the only thing I've heard against it, is that it adds another indentation level.
So to summarise: won't work < 2.5, introduces the 'as' keyword and adds an indentation level.
Otherwise, you stay in control of handling exceptions as normal, and the finally block closes resources if something escapes.
Works for me!