Effective grep of log file

I have a log file with a lot of lines on this format:

10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:43:51.008Z] "POST /page/sub2.php?id=alice&jw_token=07e876afdc2245b53214fff0d4763730 HTTP/1.1" 200 275 "-" "alice/7.61.1"

My objective is simple: I want to output Alice's jw_token, and that's it.

So, my logic is that I need to find the lines that include id=alice and a status code of 200, then return the value of jw_token.

I actually managed to do this, but only with this absolute monstrosity of a line:

$ grep "id=alice" main.log | grep 200 | grep -o "n=.* " | sed "s/.*=//g" | sed "s/ .*$//g" | uniq
07e876afdc2245b53214fff0d4763730

This looks horrible, and may also break on a number of things (for instance if "200" happens to appear anywhere else on the line). I know grep -P could have cleaned it up somewhat, but unfortunately that flag isn't available on my Mac.

I also did it by including Python, like this:

cat << EOF > analyzer.py
import re

with open('main.log') as f:
    for line in f:
        if "id=alice" in line and " 200 " in line:
            print(re.search('(?<=jw_token\=).*?(?=\s)', line).group())
            break
EOF
python3 analyzer.py && rm analyzer.py

(This was actually MUCH (orders of magnitude) faster than the previous line with grep and sed. Why?)

Surely there are ways to make this a lot cleaner and prettier. How?

标签： regex logging grep

4条回答

我命由我不由天

2楼-- · 2020-04-17 07:44

Could you please try following, this should be an easy task for awk in case you are ok with awk.

awk '
/alice/ && match($0,/jw_token=[^ ]* HTTP\/1\.1\" 200/){
  val=substr($0,RSTART+9,RLENGTH-9)
  split(val,array," ")
  print array[1]
  delete array
}'  Input_file

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2020-04-17 07:47

If you're open to a perl oneliner:

perl -ane '/id=alice&jw_token=([a-f0-9]+).+\b200\b/ && $h{$1}++;END{print"$_\n" for sort(keys %h)}' file.txt
07e876afdc2245b53214fff0d4763730

Explanation:

/                           # regex delimiter
    id=alice&jw_token=      # literally
    ([a-f0-9]+)             # group 1, 1 or more hexa
    .+                      # 1 or more any character
    \b200\b                 # 200 surrounded with word boundaries
/                           # regex delimiter, you may use /i for case insensitive

0人赞添加讨论(0) 举报

看我几分像从前

4楼-- · 2020-04-17 07:53

Would you try the following:

grep "id=alice.* 200 " main.log | sed 's/.*jw_token=\([^ ]\{1,\}\).*/\1/' | uniq

0人赞添加讨论(0) 举报

贼婆χ

5楼-- · 2020-04-17 07:54

You can achieve this by using just one grep and sed with this command,

grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log|sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'|uniq

Here first part grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log will filter out all lines not having alice and not having status 200 and next sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/' part will just capture the token in group1 and replace whole line with just the token.

0人赞添加讨论(0) 举报

Effective grep of log file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间