I'm writing a script that parses a file with HTTP traffic lines, and takes out the domains and currently just prints them to the screen. I'm using httpry to continuously write the traffic to a file. Here is the script I'm using to strip out the domain names
#!/usr/bin/python
import re
input = open("results.txt","r")
for line in input:
domain = line.split()[6]
if domain != "-":
print domain
While this script works great, I'd like a way to continuously run this script so that as new traffic gets added to the input file, the script is able to strip it out. I can't just run awk on the output of httpry, as I'm eventually going to be entering these domains into a Mongo database, and I'll need the script to do that as well. If anyone could give me some ideas how to constantly run this python script on the output, but not reprint previous entries, it would be much appreciated. Thanks.
Node.js has a nice readline module that should handle this nicely:
Save this in a .js file and do
node domains.js
, or whatever you named it. Orcat file | node domains.js
.It should integrate nicely with mongodb in the future, too :)
Try this
tail -f
implementation as found at http://code.activestate.com/recipes/157035-tail-f-in-python/