Python - How can I open a file and specify the off

2019-01-26 05:59发布

I'm writing a program that will parse an Apache log file periodically to log it's visitors, bandwidth usage, etc..

The problem is, I don't want to open the log and parse data I've already parsed. For example:

line1
line2
line3

If I parse that file, I'll save all the lines then save that offset. That way, when I parse it again, I get:

line1
line2
line3 - The log will open from this point
line4
line5

Second time round, I'll get line4 and line5. Hopefully this makes sense...

What I need to know is, how do I accomplish this? Python has the seek() function to specify the offset... So do I just get the filesize of the log (in bytes) after parsing it then use that as the offset (in seek()) the second time I log it?

I can't seem to think of a way to code this >.<

8条回答
SAY GOODBYE
2楼-- · 2019-01-26 06:31

Easy but not recommended :):

last_line_processed = get_last_line_processed()    
with open('file.log') as log
    for record_number, record in enumerate(log):
        if record_number >= last_line_processed:
            parse_log(record)
查看更多
狗以群分
3楼-- · 2019-01-26 06:34

Here is code proving using the length sugestion of yours and the tell methond:

beginning="""line1
line2
line3"""

end="""- The log will open from this point
line4
line5"""

openfile= open('log.txt','w')
openfile.write(beginning)
endstarts=openfile.tell()
openfile.close()

open('log.txt','a').write(end)
print open('log.txt').read()

print("\nAgain:")
end2 = open('log.txt','r')
end2.seek(len(beginning))

print end2.read()  ## wrong by two too little because of magic newlines in Windows
end2.seek(endstarts)

print "\nOk in Windows also"
print end2.read()
end2.close()
查看更多
家丑人穷心不美
4楼-- · 2019-01-26 06:34

If your logfiles fit easily in memory (this is, you have a reasonable rotation policy) you can easily do something like:

log_lines = open('logfile','r').readlines()
last_line = get_last_lineprocessed() #From some persistent storage
last_line = parse_log(log_lines[last_line:])
store_last_lineprocessed(last_line)

If you cannot do this, you can use something like (see accepted answer's use of seek and tell, in case you need to do it with them) Get last n lines of a file with Python, similar to tail

查看更多
We Are One
5楼-- · 2019-01-26 06:38

If you're parsing your log line per line, you could juste save line number from the last parsing. You would juste have then to start read it from the good line the next time.

Seeking is more usefull when you have to be in a very specific place in the file.

查看更多
来,给爷笑一个
6楼-- · 2019-01-26 06:41

Here is an efficient and safe snippet to do that saving the offset read in a parallell file. Basically logtail in python.

with open(filename) as log_fd:
    offset_filename = os.path.join(OFFSET_ROOT_DIR,filename)
    if not os.path.exists(offset_filename):
        os.makedirs(os.path.dirname(offset_filename))
        with open(offset_filename, 'w') as offset_fd:
            offset_fd.write(str(0))
    with open(offset_filename, 'r+') as offset_fd:
        log_fd.seek(int(offset_fd.readline()) or 0)
        new_logrows_handler(log_fd.readlines())
        offset_fd.seek(0)
        offset_fd.write(str(log_fd.tell()))
查看更多
仙女界的扛把子
7楼-- · 2019-01-26 06:42

Note that you can seek() in python from the end of the file:

f.seek(-3, os.SEEK_END)

puts the read position 3 lines from the EOF.

However, why not use diff, either from the shell or with difflib?

查看更多
登录 后发表回答