Continuously Parse file in Python

I'm writing a script that parses a file with HTTP traffic lines, and takes out the domains and currently just prints them to the screen. I'm using httpry to continuously write the traffic to a file. Here is the script I'm using to strip out the domain names

#!/usr/bin/python

import re

input = open("results.txt","r")

for line in input:
    domain = line.split()[6]
    if domain != "-":
        print domain

While this script works great, I'd like a way to continuously run this script so that as new traffic gets added to the input file, the script is able to strip it out. I can't just run awk on the output of httpry, as I'm eventually going to be entering these domains into a Mongo database, and I'll need the script to do that as well. If anyone could give me some ideas how to constantly run this python script on the output, but not reprint previous entries, it would be much appreciated. Thanks.

标签： python parsing

2条回答

贪生不怕死

2楼-- · 2019-03-31 21:43

Node.js has a nice readline module that should handle this nicely:

var readline = require('readline')
  , fs = require('fs')

var input = process.stdin; // or: fs.createReadStream('input.txt');
var output = process.stdout; // or: fs.createWriteStream('output.txt')

var reader = readline.createInterface({
  input: input,
  output: output
});

reader.on('line', function(line) {
  this.write(line.split(/[ ]+/)[6]);
});

Save this in a .js file and do node domains.js, or whatever you named it. Or cat file | node domains.js.

It should integrate nicely with mongodb in the future, too :)

0人赞添加讨论(0) 举报

地球回转人心会变

3楼-- · 2019-03-31 21:55

Try this tail -f implementation as found at http://code.activestate.com/recipes/157035-tail-f-in-python/

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline

0人赞添加讨论(0) 举报

Continuously Parse file in Python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间