Continuously Parse file in Python

2019-03-31 21:04发布

I'm writing a script that parses a file with HTTP traffic lines, and takes out the domains and currently just prints them to the screen. I'm using httpry to continuously write the traffic to a file. Here is the script I'm using to strip out the domain names

#!/usr/bin/python

import re

input = open("results.txt","r")

for line in input:
    domain = line.split()[6]
    if domain != "-":
        print domain

While this script works great, I'd like a way to continuously run this script so that as new traffic gets added to the input file, the script is able to strip it out. I can't just run awk on the output of httpry, as I'm eventually going to be entering these domains into a Mongo database, and I'll need the script to do that as well. If anyone could give me some ideas how to constantly run this python script on the output, but not reprint previous entries, it would be much appreciated. Thanks.

2条回答
贪生不怕死
2楼-- · 2019-03-31 21:43

Node.js has a nice readline module that should handle this nicely:

var readline = require('readline')
  , fs = require('fs')

var input = process.stdin; // or: fs.createReadStream('input.txt');
var output = process.stdout; // or: fs.createWriteStream('output.txt')

var reader = readline.createInterface({
  input: input,
  output: output
});

reader.on('line', function(line) {
  this.write(line.split(/[ ]+/)[6]);
});

Save this in a .js file and do node domains.js, or whatever you named it. Or cat file | node domains.js.

It should integrate nicely with mongodb in the future, too :)

查看更多
地球回转人心会变
3楼-- · 2019-03-31 21:55

Try this tail -f implementation as found at http://code.activestate.com/recipes/157035-tail-f-in-python/

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline
查看更多
登录 后发表回答