I normally avoid reading files like this:
with open(file) as f:
list_of_lines = f.readlines()
and use this type of code instead.
f = open(file)
for line in file:
#do something
Unless I only have to iterate over a few lines in a file (and I know which lines those are) then it think it is easier to take slices of the list_of_lines. Now this has come back to bite me. I have a HUGE file (reading it into memory is not possible) but I don't need to iterate over all of the lines just a few of them. I have code completed that finds where my first line is and finds how many lines after that I need to edit. I just don't have nay idea how to write this loop.
n = #grep for number of lines
start = #pattern match the start line
f=open('big_file')
#some loop over f from start o start + n
#edit lines
EDIT: my title may have lead to a debate rather than an answer.
If I understand your question correctly, the problem you're encountering is that storing all the lines of text in a list and then taking a slice uses too much memory. What you want is to read the file line-by-line, while ignoring all but a certain set of lines (say, lines [17,34)
for example).
Try using enumerate
to keep track of which line number you're on as you iterate through the file. Here is a generator-based approach which uses yield
to output the interesting lines only one at a time:
def read_only_lines(f, start, finish):
for ii,line in enumerate(f):
if ii>=start and ii<finish:
yield line
elif ii>=finish:
return
f = open("big text file.txt", "r")
for line in read_only_lines(f, 17, 34):
print line
This read_only_lines
function basically reimplements itertools.islice
from the standard library, so you could use that to make an even more compact implementation:
from itertools import islice
for line in islice(f, 17, 34):
print line
If you want to capture the lines of interest in a list rather than a generator, just cast them with a list:
from itertools import islice
lines_of_interest = list( islice(f, 17, 34) )
do_something_awesome( lines_of_interest )
do_something_else( lines_of_interest )