Read large text files in Python, line by line with

I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() because it will create a very large list in the memory.

How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?

f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?

f.next()

Plus, what can I do to read this in reverse order, just as the Linux tail command?

I found:

http://code.google.com/p/pytailer/

and

"python head, tail and backward read by lines of a text file"

Both worked very well!

标签： python

13条回答

查无此人

2楼-- · 2018-12-31 10:25

f=open('filename','r').read()
f1=f.split('\n')
for i in range (len(f1)):
    do_something_with(f1[i])

hope this helps.

0人赞添加讨论(0) 举报

若你有天会懂

3楼-- · 2018-12-31 10:27

An old school approach:

fh = open(file_name, 'rt')
line = fh.readline()
while line:
    # do stuff with line
    line = fh.readline()
fh.close()

0人赞添加讨论(0) 举报

明月照影归

4楼-- · 2018-12-31 10:28

Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.

Then I had to remove a "=\n" in the middle of each line.

Then I split the lines at the new line.

b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
        a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
        data_chunk = (a_data.replace('=\n','').strip()) #Splitting characters removed
        data_list = data_chunk.split('\n')  #List containing lines in chunk
        #print(data_list,'\n')
        #time.sleep(1)
        for j in range(len(data_list)): #iterate through data_list to get each item 
            i += 1
            line_of_data = data_list[j]
            print(line_of_data)

Here is the code starting just above "print data" in Arohi's code.

0人赞添加讨论(0) 举报

余生请多指教

5楼-- · 2018-12-31 10:29

Please try this:

with open('filename','r',buffering=100000) as f:
    for line in f:
        print line

0人赞添加讨论(0) 举报

泛滥B

6楼-- · 2018-12-31 10:30

You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html

From the docs:

import fileinput
for line in fileinput.input("filename"):
    process(line)

This will avoid copying the whole file into memory at once.

0人赞添加讨论(0) 举报

孤独总比滥情好

7楼-- · 2018-12-31 10:34

How about this? Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

def chunks(file,size=1024):
    while 1:

        startat=fh.tell()
        print startat #file's object current position from the start
        fh.seek(size,1) #offset from current postion -->1
        data=fh.readline()
        yield startat,fh.tell()-startat #doesnt store whole list in memory
        if not data:
            break
if os.path.isfile(fname):
    try:
        fh=open(fname,'rb') 
    except IOError as e: #file --> permission denied
        print "I/O error({0}): {1}".format(e.errno, e.strerror)
    except Exception as e1: #handle other exceptions such as attribute errors
        print "Unexpected error: {0}".format(e1)
    for ele in chunks(fh):
        fh.seek(ele[0])#startat
        data=fh.read(ele[1])#endat
        print data

0人赞添加讨论(0) 举报

1 2 3 下一页

Read large text files in Python, line by line with

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间