I have a log file with a lot of lines on this format:
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:43:51.008Z] "POST /page/sub2.php?id=alice&jw_token=07e876afdc2245b53214fff0d4763730 HTTP/1.1" 200 275 "-" "alice/7.61.1"
My objective is simple: I want to output Alice's jw_token, and that's it.
So, my logic is that I need to find the lines that include id=alice
and a status code of 200, then return the value of jw_token
.
I actually managed to do this, but only with this absolute monstrosity of a line:
$ grep "id=alice" main.log | grep 200 | grep -o "n=.* " | sed "s/.*=//g" | sed "s/ .*$//g" | uniq
07e876afdc2245b53214fff0d4763730
This looks horrible, and may also break on a number of things (for instance if "200" happens to appear anywhere else on the line). I know grep -P
could have cleaned it up somewhat, but unfortunately that flag isn't available on my Mac.
I also did it by including Python, like this:
cat << EOF > analyzer.py
import re
with open('main.log') as f:
for line in f:
if "id=alice" in line and " 200 " in line:
print(re.search('(?<=jw_token\=).*?(?=\s)', line).group())
break
EOF
python3 analyzer.py && rm analyzer.py
(This was actually MUCH (orders of magnitude) faster than the previous line with grep
and sed
. Why?)
Surely there are ways to make this a lot cleaner and prettier. How?
Could you please try following, this should be an easy task for
awk
in case you are ok withawk
.If you're open to a perl oneliner:
Explanation:
Would you try the following:
You can achieve this by using just one grep and sed with this command,
Here first part
grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log
will filter out all lines not having alice and not having status 200 and nextsed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'
part will just capture the token in group1 and replace whole line with just the token.