Is it safe to read some lines with readline()
and also use for line in file
, and is it guaranteed to use the same file position?
Usually, I want to disregard the first line (headers), so I do this:
FI = open("myfile.txt")
FI.readline() # disregard the first line
for line in FI:
my_process(line)
FI.close()
Is this safe, i.e., is it guaranteed that the same file position variable is used while iterating lines?
No, it isn't safe:
You could use
next()
to skip the first line here. You should also test forStopIteration
, which will be raised if the file is empty.It is safe if the mechanisms are under control.
=============================
.
There is no problem to do an iteration after a readline() instruction
But there's one to execute a readline() after an iteration
I created a 'rara.txt' file with this text ( each line have a length of 5 because of the '\r\n' end of line under Windows)
And I executed
The result is
.
A strange thing is that if we renew the "cursor" by tell() , method readline() can be active again after an iteration (I don't know what is the behind-the-scene mechanism of "cursor" renewal ):
result
Anyway, we note that even if the algorithm is to read only 4 lines during iteration (thanks to the count cnt) , the cursor goes already at the end of the file from the very beginning of the iteration: all the file, ahead of the current position when the iteration begins, is once read.
So pos = FI.tell() before the break doesn't give the position after the 4 lines read, but the position of the end of the file.
.
We must do something special if we want to readline() again , after an iteration , from the exact point at which ended the 4 lines reading during an iteration:
result
.
All these manipulations are possible only because the file was opened in binary mode, because I am on Windows which uses '\r\n' as end of lines to write a file, even if it is ordered to write (in 'w' mode) something like 'abcdef\n',
while on the other hand Python transforms (in mode 'r') all the '\r\n' in '\n'.
That's a mess, and to control all this, files must be opened in 'rb' if we want to do precise manipulations.
.
You know what ? I love these games in the positions of a file
This works out well in the long run. It ignores the fact that you're processing a file, and works with any sequence. Also, having the explicit iterator object (
rdr
) hanging around allows you to skip lines inside the body of for loop without messing anything up.