I have a large log file that contains lines such as:
82.117.22.206 - - [08/Mar/2013:20:36:42 +0000] "GET /key/0/www.mysite.org.uk/ HTTP/1.0" 200 0 "-" "-"
And i want to extract from each line that matches the above pattern only the ip 82.117.22.206
followed by a space and the text www.mysite.org.uk
from it. The ip and text can differ. So given the above line the line in the output file would be:
82.117.22.206 www.mysite.org.uk
How can I use grep or other commands in bash to make the output unique so that the output file won't contain two identical lines? Can someone refer me to a good place to start learnning more about this kind of shell scripting?
With perl you can capture the parts
and call this as
This extracts the needed fields, sorts and eliminates duplicate lines.
if you figure out the regex to use, you could do something like:
only, you'd cat your log, instead of echoing a string.