Parsing log lines using awk

2019-09-19 06:58发布

I have to parse some information out of big log file lines. Its something like

abc.log:2012-03-03 11:12:12,457 ABC[123.RPH.-101] XYZ: Query=get_data @a=0,@b=1 Rows=10Time=100   

There are many log lines like above in the logfiles. I need to extract information like datetime i.e. 2012-03-03 11:12:12,457 job details i.e. 123.RPH.-101 Query i.e. get_data (no parameters) Rows i.e. 10 Time i.e. 100

So output should look like

2012-03-03 11:12:12,457|123|-101|get_data|10|100  

I have tried various permutation computations with awk but not getting it right.

5条回答
Melony?
2楼-- · 2019-09-19 07:34

TXR:

@(collect :vars ())
@file:@year-@mon-@day @hh:@mm:@ss,@ms @jobname[@job1.RPH.@job2] @queryname: Query=@query @params Rows=@{rows /[0-9]+/}Time=@time
@(output)
@year-@mon-@day @hh-@mm-@ss,@ms|@job1|@job2|@query|@rows|@time
@(end)
@(end)

Run:

$ txr data.txr data.log
2012-03-03 11-12-12,457|123|-101|get_data|10|100

Here is one way to make the program assert that every line in the log file must match the pattern. First, do not allow gaps in the collection. This means that nonmatching material cannot be skipped to just look for the lines which match:

@(collect :gap 0 :vars ())

Secondly, at the end of the script we add this:

@(eof)

This specifies a match on the end of file. If the @(collect) bails early because of a nonmatching line (due to the :gap 0 constraint), the @(eof) will fail and so the script will terminate with a failed status.

In this type of task, field splitting regex hacks will backfire because they can blindly produce incorrect results for some subset of the input being processed. If the input contains a vast number of lines, there is no easy way to check for mistakes. It's best to have a very specific match that is likely to reject anything which doesn't resemble the examples on which the pattern is based.

查看更多
老娘就宠你
3楼-- · 2019-09-19 07:38

Just need the right field separators

awk -F '[][ =.]' -v OFS='|' '{print $1 " " $2, $4, $6, $10, $15, $17}'

I'm assuming the "abc.log:" is not actually in the log file.

查看更多
狗以群分
4楼-- · 2019-09-19 07:48

Here's another, less fancy, AWK solution (but works in mawk too):

BEGIN { OFS="|" }

{
    i = match($3, /\[[^]]+\]/)
    job = substr($3, i + 1, RLENGTH - 2)
    split($5, X, "=")
    query = X[2]
    split($7, X, "=")
    rows = X[2]
    split($8, X, "=")
    time= X[2]

    print $1 " " $2, job, query, rows, time
}

Nothe that this assumes the Rows=10 and Time=100 strings are separated by space, that is, there was a typo in the question example.

查看更多
手持菜刀,她持情操
5楼-- · 2019-09-19 07:53

Well, this is really horrible, but since sed is in the tags and there are no answers yet...

sed -e 's/[^0-9]*//' -re 's/[^ ]*\[([^.]*)\.[^.]*\.([^]]*)\]/| \1 | \2/' -e 's/[^ ]* Query=/| /' -e 's/ [^ ]* Rows=/ | /' -e 's/Time=/ | /' my_logfile
查看更多
▲ chillily
6楼-- · 2019-09-19 07:57

My solution in gawk: it uses gawk extension to match.

You didn't give specification of file format, so you may have to adjust the regexes.

Script invocation: gawk -v OFS='|' -f script.awk

{
match($0, /[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+,[0-9]+/)
date_time = substr($0, RSTART, RLENGTH)

match($0, /\[([0-9]+).RPH.(-?[0-9]+)\]/, matches)
job_detail_1 = matches[1]
job_detail_2 = matches[2]

match($0, /Query=(\w+)/, matches)
query = matches[1]

match($0, /Rows=([0-9]+)/, matches)
rows = matches[1]

match($0, /Time=([0-9]+)/, matches)
time = matches[1]

print date_time, job_detail_1, job_detail_2, query,rows, time
}
查看更多
登录 后发表回答