Is it safe to read some lines with readline()
and also use for line in file
, and is it guaranteed to use the same file position?
Usually, I want to disregard the first line (headers), so I do this:
FI = open("myfile.txt")
FI.readline() # disregard the first line
for line in FI:
my_process(line)
FI.close()
Is this safe, i.e., is it guaranteed that the same file position variable is used while iterating lines?
No, it isn't safe:
As a consequence of using a read-ahead
buffer, combining next() with other
file methods (like readline()) does
not work right.
You could use next()
to skip the first line here. You should also test for StopIteration
, which will be raised if the file is empty.
with open('myfile.txt') as f:
try:
header = next(f)
except StopIteration as e:
print "File is empty"
for line in f:
# do stuff with line
This works out well in the long run. It ignores the fact that you're processing a file, and works with any sequence. Also, having the explicit iterator object (rdr
) hanging around allows you to skip lines inside the body of for loop without messing anything up.
with open("myfile.txt","r") as source:
rdr= iter(source)
heading= next(rdr)
for line in rdr:
process( line )
It is safe if the mechanisms are under control.
=============================
.
There is no problem to do an iteration after a readline() instruction
But there's one to execute a readline() after an iteration
I created a 'rara.txt' file with this text ( each line have a length of 5 because of the '\r\n' end of line under Windows)
1AA
2BB
3CC
4DD
5EE
6FF
7GG
8HH
9II
10j
11k
12l
13m
14n
15o
And I executed
FI = open("rara.txt",'rb')
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell(),'\n'
cnt = 0
for line in FI:
cnt += 1
print 'cnt=='+str(cnt)+' '+repr(line)+' len=='+str(len(line))+\
" FI.tell() after 'line in FI' : ",FI.tell()
if cnt==4:
break
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell()
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell(),'\n'
for line in FI:
print 'cnt=='+str(cnt)+' '+repr(line)+' len=='+str(len(line))+\
" FI.tell() after 'line in FI' : ",FI.tell()
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
The result is
'1AA\r\n' len==5 FI.tell() after FI.readline() : 5
cnt==1 '2BB\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==2 '3CC\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==3 '4DD\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==4 '5EE\r\n' len==5 FI.tell() after 'line in FI' : 75
FI.tell() after iteration 'for line in FI' : 75
Traceback (most recent call last):
File "E:\Python\NNN codes\esssssai.py", line 16, in <module>
lineR = FI.readline()
ValueError: Mixing iteration and read methods would lose data
.
A strange thing is that if we renew the "cursor" by tell() , method readline() can be active again after an iteration (I don't know what is the behind-the-scene mechanism of "cursor" renewal ):
FI = open("rara.txt",'rb')
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell(),'\n'
cnt = 0
for line in FI:
cnt += 1
print 'cnt=='+str(cnt)+' '+repr(line)+' len=='+str(len(line))+\
" FI.tell() after 'line in FI' : ",FI.tell()
if cnt==4:
pos = FI.tell()
break
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
FI.seek(pos)
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell()
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell(),'\n'
for line in FI:
print 'cnt=='+str(cnt)+' '+repr(line)+' len=='+str(len(line))+\
" FI.tell() after 'line in FI' : ",FI.tell()
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
result
'1AA\r\n' len==5 FI.tell() after FI.readline() : 5
cnt==1 '2BB\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==2 '3CC\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==3 '4DD\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==4 '5EE\r\n' len==5 FI.tell() after 'line in FI' : 75
FI.tell() after iteration 'for line in FI' : 75
'' len==0 FI.tell() after FI.readline() : 75
'' len==0 FI.tell() after FI.readline() : 75
FI.tell() after iteration 'for line in FI' : 75
Anyway, we note that even if the algorithm is to read only 4 lines during iteration (thanks to the count cnt) , the cursor goes already at the end of the file from the very beginning of the iteration: all the file, ahead of the current position when the iteration begins, is once read.
So pos = FI.tell() before the break doesn't give the position after the 4 lines read, but the position of the end of the file.
.
We must do something special if we want to readline() again , after an iteration , from the exact point at which ended the 4 lines reading during an iteration:
FI = open("rara.txt",'rb')
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell(),'\n'
cnt = 0
pos = FI.tell()
for line in FI:
cnt += 1
pos += len(line)
print 'cnt=='+str(cnt)+' '+repr(line)+' len=='+str(len(line))+\
" FI.tell() after 'line in FI' : ",FI.tell()
if cnt==4:
break
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell()
print " pos after iteration 'for line in FI' : ",pos,'\n'
FI.seek(pos)
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell()
lineR = FI.readline()
print repr(lineR)+' len=='+str(len(lineR))+\
' FI.tell() after FI.readline() : ',FI.tell(),'\n'
cnt = 0
for line in FI:
cnt += 1
print 'cnt=='+str(cnt)+' '+repr(line)+' len=='+str(len(line))+\
" FI.tell() after 'line in FI' : ",FI.tell()
print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
result
'1AA\r\n' len==5 FI.tell() after FI.readline() : 5
cnt==1 '2BB\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==2 '3CC\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==3 '4DD\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==4 '5EE\r\n' len==5 FI.tell() after 'line in FI' : 75
FI.tell() after iteration 'for line in FI' : 75
pos after iteration 'for line in FI' : 25
'6FF\r\n' len==5 FI.tell() after FI.readline() : 30
'7GG\r\n' len==5 FI.tell() after FI.readline() : 35
cnt==1 '8HH\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==2 '9II\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==3 '10j\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==4 '11k\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==5 '12l\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==6 '13m\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==7 '14n\r\n' len==5 FI.tell() after 'line in FI' : 75
cnt==8 '15o\r\n' len==5 FI.tell() after 'line in FI' : 75
FI.tell() after iteration 'for line in FI' : 75
.
All these manipulations are possible only because the file was opened in binary mode, because I am on Windows which uses '\r\n' as end of lines to write a file, even if it is ordered to write (in 'w' mode) something like 'abcdef\n',
while on the other hand Python transforms (in mode 'r') all the '\r\n' in '\n'.
That's a mess, and to control all this, files must be opened in 'rb' if we want to do precise manipulations.
.
You know what ? I love these games in the positions of a file