Python's f.tell doesn't work as I expected when you iterate over a file with f.next():
>>> f=open(".bash_profile", "r")
>>> f.tell()
0
>>> f.next()
"alias rm='rm -i'\n"
>>> f.tell()
397
>>> f.next()
"alias cp='cp -i'\n"
>>> f.tell()
397
>>> f.next()
"alias mv='mv -i'\n"
>>> f.tell()
397
Looks like it gives you the position of the buffer rather than the position of what you just got with next().
I've previously used the seek/tell trick to rewind one line when iterating over a file with readline(). Is there a way to rewind one line when using next()?
No. I would make an adapter that largely forwarded all calls, but kept a copy of the last line when you did next
and then let you call a different method to make that line pop out again.
I would actually make the adapter be an adapter that could wrap any iterable instead of a wrapper for file because that sounds like it would be frequently useful in other contexts.
Alex's suggestion of using the itertools.tee
adapter also works, but I think writing your own iterator adapter to handle this case in general would be cleaner.
Here is an example:
class rewindable_iterator(object):
not_started = object()
def __init__(self, iterator):
self._iter = iter(iterator)
self._use_save = False
self._save = self.not_started
def __iter__(self):
return self
def next(self):
if self._use_save:
self._use_save = False
else:
self._save = self._iter.next()
return self._save
def backup(self):
if self._use_save:
raise RuntimeError("Tried to backup more than one step.")
elif self._save is self.not_started:
raise RuntimeError("Can't backup past the beginning.")
self._use_save = True
fiter = rewindable_iterator(file('file.txt', 'r'))
for line in fiter:
result = process_line(line)
if result is DoOver:
fiter.backup()
This wouldn't be too hard to extend into something that allowed you to backup by more than just one value.
itertools.tee is probably the least-bad approach -- you can't "defeat" the buffering done by iterating on the file (nor would you want to: the performance effects would be terrible), so keeping two iterators, one "one step behind" the other, seems the soundest solution to me.
import itertools as it
with open('a.txt') as f:
f1, f2 = it.tee(f)
f2 = it.chain([None], f2)
for thisline, prevline in it.izip(f1, f2):
...
Python's file iterator does a lot of buffering, thereby advancing the position in the file far ahead of your iteration. If you want to use file.tell()
you must do it "the old way":
with open(filename) as fileob:
line = fileob.readline()
while line:
print fileob.tell()
line = fileob.readline()