Parsing log lines using awk

I have to parse some information out of big log file lines. Its something like

abc.log:2012-03-03 11:12:12,457 ABC[123.RPH.-101] XYZ: Query=get_data @a=0,@b=1 Rows=10Time=100

There are many log lines like above in the logfiles. I need to extract information like datetime i.e. 2012-03-03 11:12:12,457 job details i.e. 123.RPH.-101 Query i.e. get_data (no parameters) Rows i.e. 10 Time i.e. 100

So output should look like

2012-03-03 11:12:12,457|123|-101|get_data|10|100

I have tried various permutation computations with awk but not getting it right.

标签： parsing sed awk logging

5条回答

Melony?

2楼-- · 2019-09-19 07:34

TXR:

@(collect :vars ())
@file:@year-@mon-@day @hh:@mm:@ss,@ms @jobname[@job1.RPH.@job2] @queryname: Query=@query @params Rows=@{rows /[0-9]+/}Time=@time
@(output)
@year-@mon-@day @hh-@mm-@ss,@ms|@job1|@job2|@query|@rows|@time
@(end)
@(end)

Run:

$ txr data.txr data.log
2012-03-03 11-12-12,457|123|-101|get_data|10|100

Here is one way to make the program assert that every line in the log file must match the pattern. First, do not allow gaps in the collection. This means that nonmatching material cannot be skipped to just look for the lines which match:

@(collect :gap 0 :vars ())

Secondly, at the end of the script we add this:

@(eof)

This specifies a match on the end of file. If the @(collect) bails early because of a nonmatching line (due to the :gap 0 constraint), the @(eof) will fail and so the script will terminate with a failed status.

In this type of task, field splitting regex hacks will backfire because they can blindly produce incorrect results for some subset of the input being processed. If the input contains a vast number of lines, there is no easy way to check for mistakes. It's best to have a very specific match that is likely to reject anything which doesn't resemble the examples on which the pattern is based.

0人赞添加讨论(0) 举报

老娘就宠你

3楼-- · 2019-09-19 07:38

Just need the right field separators

awk -F '[][ =.]' -v OFS='|' '{print $1 " " $2, $4, $6, $10, $15, $17}'

I'm assuming the "abc.log:" is not actually in the log file.

0人赞添加讨论(0) 举报

狗以群分

4楼-- · 2019-09-19 07:48

Here's another, less fancy, AWK solution (but works in mawk too):

BEGIN { OFS="|" }

{
    i = match($3, /\[[^]]+\]/)
    job = substr($3, i + 1, RLENGTH - 2)
    split($5, X, "=")
    query = X[2]
    split($7, X, "=")
    rows = X[2]
    split($8, X, "=")
    time= X[2]

    print $1 " " $2, job, query, rows, time
}

Nothe that this assumes the Rows=10 and Time=100 strings are separated by space, that is, there was a typo in the question example.

0人赞添加讨论(0) 举报

手持菜刀，她持情操

5楼-- · 2019-09-19 07:53

Well, this is really horrible, but since sed is in the tags and there are no answers yet...

sed -e 's/[^0-9]*//' -re 's/[^ ]*\[([^.]*)\.[^.]*\.([^]]*)\]/| \1 | \2/' -e 's/[^ ]* Query=/| /' -e 's/ [^ ]* Rows=/ | /' -e 's/Time=/ | /' my_logfile

0人赞添加讨论(0) 举报

▲ chillily

6楼-- · 2019-09-19 07:57

My solution in gawk: it uses gawk extension to match.

You didn't give specification of file format, so you may have to adjust the regexes.

Script invocation: gawk -v OFS='|' -f script.awk

{
match($0, /[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+,[0-9]+/)
date_time = substr($0, RSTART, RLENGTH)

match($0, /\[([0-9]+).RPH.(-?[0-9]+)\]/, matches)
job_detail_1 = matches[1]
job_detail_2 = matches[2]

match($0, /Query=(\w+)/, matches)
query = matches[1]

match($0, /Rows=([0-9]+)/, matches)
rows = matches[1]

match($0, /Time=([0-9]+)/, matches)
time = matches[1]

print date_time, job_detail_1, job_detail_2, query,rows, time
}

0人赞添加讨论(0) 举报

Parsing log lines using awk

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间